LangWatch open-sources Scenario to run multi-turn adversarial attacks on AI agents pre-production
LangWatch open-sources Scenario to run multi-turn adversarial attacks on AI agents pre-production

The interesting failures of AI agents in production rarely happen on the first prompt. They happen on the twentieth. After enough friendly back and forth, an agent that would refuse a sensitive request from a stranger starts treating the same request as routine work for a colleague. Single-shot red-teaming, which tests one input at a time, does not catch this — the failure mode is built out of context.

LangWatch, the Amsterdam-based platform for testing and improving AI- and agent-driven applications, has open-sourced a framework aimed squarely at that gap. LangWatch Scenario simulates realistic, multi-turn attacks on AI applications: it builds context across a conversation, applies authority roles such as "I'm conducting a compliance audit", and uses a second model to evaluate progress turn by turn and adjust the next attack. The point is to surface the kind of vulnerabilities that emerge only after the agent has spent twenty turns being helpful.

The framework implements the Crescendo strategy — a four-phase escalation that starts with friendly exploration, progresses through hypothetical questions and authority impersonation, and ends under sustained pressure. Because a second model adapts the attack as it goes, the testing harness optimises against the agent in something closer to the way a determined adversary would, rather than firing static prompts at it. The intended buyers are organisations operating customer-service bots, data-analytics agents and other production AI applications in regulated environments — banks, insurers, software firms running agents at scale.

An AI agent that rejects every single prompt gives you a false sense of security. In practice, cybercriminals do not work with a single direct question. They have dozens of relaxed conversations, build trust, and when the agent is in a cooperative mode after twenty turns, a request that would have been rejected in turn one suddenly becomes no problem at all.

Rogerio Chaves, co-founder and CTO, LangWatch

Co-founder and CEO Manouk Draisma framed the broader problem as one of measurement: traditional testing methods give development teams confidence in the wrong things, and the failures organisations care about are precisely the ones that are hardest to detect with single-shot evaluation.

It is rarely about a single spectacular hack. It is about patience and context. A cybercriminal who interacts calmly and systematically with an AI agent for twenty minutes can extract sensitive information that a direct attack would never reveal. LangWatch Red-Teaming makes these hidden risks visible before damage occurs.

Manouk Draisma, co-founder and CEO, LangWatch

Two things make this a more credible release than the average vendor announcement. First, it is open source rather than a hosted-only product, which means the methodology is inspectable by the security teams it is aimed at. Second, the multi-turn framing is the right framing — the AI safety literature has been pointing at sustained-context attacks as the live risk for two years, and most commercial red-teaming products have not caught up.

What it does not do is solve the problem. An open-source framework gives security teams a sharper testing harness; what they do with the findings, how they remediate, and how they keep up as the underlying models change is still work. Scenario is a better starting point than what most teams have today, not a finished product.

For organisations that have been deferring AI red-teaming because the available tools felt like security theatre, this is worth a serious look.

Read more: langwatch.ai · github.com/langwatch/scenario

More News