Multi-Turn orchestrators#
03 Dec 2024 - Rich Lundeen
In PyRIT, orchestrators are typically seen as the top-level component. This is where your attack logic is implemented, while notebooks should primarily be used to configure orchestrators.
Over time, certain patterns have emerged—one of the most common being the multi-turn scenario.
The problem#
If you look at some of the code from release 0.4.0 in August, you may notice some weirdness.
The Red Teaming Orchestrator, Crescendo, TAP, and PAIR all follow a similar setup: you configure your attack LLM, scorer, and target, then send prompts to achieve an objective. However, their implementation details vary.
In red teaming orchestrator, you had an attack_strategy
instead of an objective
(what even is an attack_strategy?).
PAIR had code that was max_conversation_depth
which was the same as Crescendo’s max_rounds
which was also passed as part of the apply_crescendo_attack_async
method vs the init
for PAIR. And also Crescendo returned a score, while the others mostly returned a conversation?
There were also missed opportunities for code reuse. For instance, functions like print_conversation
(sometimes named print
) performed almost identical tasks but required separate implementations due to differing class structures.
There were some standardization efforts: the target being attacked was consistently named prompt_target
, and the attacker LLM was always called red_teaming_chat
. However, because these attacks were implemented by different contributors, the overall experience felt fragmented for users.
Let’s make it better#
Can we standardize?
It turns out, yes, we can. CrescendoOrchestrator
, PairOrchestrator
, RedTeamingOrchestrator
, TreeOfAttacksWithPruningOrchestrator
are all now subclasses of MultiTurnOrchestrator
. Here is what that means.
objective_target
refers to the entity receiving the attack.adversarial_chat
represents the attacker-controlled infrastructure used to generate prompts.adversarial_chat_system_prompt_path
specifies the system prompt defined for adversarial_chat.max_turns
defines the maximum number of conversation turns.prompt_converters
are used to modify prompts before sending them to the target.objective_scorer
evaluates whether the objective was achieved.run_attack_async(objective: str, memory_labels: Optional[dict[str, str]] = None)
executes the attack and always returns aMultiTurnAttackResult
, which includes information about the conversation and the outcome.run_attacks_async
enables parallelized attacks.print_conversation_async
is now standardized and prints the “best” conversation (when multiple exist).
We hope these changes make orchestrators significantly easier to use. With the updated documentation, the “Red Teaming Orchestrator” has been renamed “Multi-Turn Orchestrator,” emphasizing that these components are now swappable. In most scenarios, you can substitute one orchestrator for another.
See the updated documentation here.
What’s next?#
Orchestrators are, at their core, meant to remain top-level components. While we’ve made strides in standardization, there’s still room for improvement. For instance, we’re planning to standardize the PromptSendingOrchestrator
in a similar way (including updating its naming for consistency). And we’ve opened a few issues for feature parity between MultiTurnOrchestrators.
Hope you enjoyed this little post. There will be more content like this coming!