Most public safety benchmarks for frontier models test single-turn refusals. We focus on pushing frontier models in multi-turn, high-pressure conversations. A model that cleanly refuses to write an insecure script on Turn 1 will often fail by Turn 8 under sustained pressure from a "frustrated senior developer" persona. We call this alignment drift, and as the industry shifts from stateless chatbots to long-horizon autonomous agents, it's one of the most consequential open problems in the field.
Atella builds the empirical infrastructure to test AI character and stability under pressure. The company was co-founded by Dr. Roy Perlis (Chair of Psychiatry at Harvard/MGH, Editor of JAMA AI) alongside a team of ML researchers. We build multi-turn, persona-driven adversarial simulation harnesses.
Rather than just prompting models for bad output, we use clinical behavioral science to construct adversarial agents that apply specific psychological pressure over 20+ turns. We then mathematically map the point where a model's safety guardrails collapse, tracking signals like response-length decay, persona sensitivity, and failure-cascade rates.
We run the industry's leading dynamic leaderboards for AI Safety and Code Security, and our data is actively used by safety teams at the frontier labs.
We're hiring a Technical Research Engineer to help scale STELLA, our multi-turn evaluation engine. The work sits at the intersection of ML research, automated red teaming, and serious software engineering.
What you'll do:
Who you are:
Compensation: $250,000–$300,000 base + 0.5%–1% equity