Multi-Turn orchestrators - PyRIT Documentation

03 Dec 2024 - Rich Lundeen

In PyRIT, orchestrators are typically seen as the top-level component. This is where your attack logic is implemented, while notebooks should primarily be used to configure orchestrators.

Over time, certain patterns have emerged—one of the most common being the multi-turn scenario.

The problem¶

If you look at some of the code from release 0.4.0 in August, you may notice some weirdness.

The Red Teaming Orchestrator, Crescendo Russinovich et al., 2024, TAP Mehrotra et al., 2023, and PAIR Chao et al., 2023 all follow a similar setup: you configure your attack LLM, scorer, and target, then send prompts to achieve an objective. However, their implementation details vary.

In red teaming orchestrator, you had an attack_strategy instead of an objective (what even is an attack_strategy?).

PAIR had code that was max_conversation_depth

which was the same as Crescendo’s max_rounds which was also passed as part of the apply_crescendo_attack_async method vs the init for PAIR. And also Crescendo returned a score, while the others mostly returned a conversation?

There were also missed opportunities for code reuse. For instance, functions like print_conversation (sometimes named print) performed almost identical tasks but required separate implementations due to differing class structures.

There were some standardization efforts: the target being attacked was consistently named prompt_target, and the attacker LLM was always called red_teaming_chat. However, because these attacks were implemented by different contributors, the overall experience felt fragmented for users.

Let’s make it better¶

Can we standardize?

It turns out, yes, we can. CrescendoOrchestrator, PairOrchestrator, RedTeamingOrchestrator, TreeOfAttacksWithPruningOrchestrator are all now subclasses of MultiTurnOrchestrator. Here is what that means.

objective_target refers to the entity receiving the attack.
adversarial_chat represents the attacker-controlled infrastructure used to generate prompts.
adversarial_chat_system_prompt_path specifies the system prompt defined for adversarial_chat.
max_turns defines the maximum number of conversation turns.
prompt_converters are used to modify prompts before sending them to the target.
objective_scorer evaluates whether the objective was achieved.
run_attack_async(objective: str, memory_labels: Optional[dict[str, str]] = None) executes the attack and always returns a OrchestratorResult, which includes information about the conversation and the outcome.
run_attacks_async enables parallelized attacks.
print_conversation_async is now standardized and prints the “best” conversation (when multiple exist).

We hope these changes make orchestrators significantly easier to use. With the updated documentation, the “Red Teaming Orchestrator” has been renamed “Multi-Turn Orchestrator,” emphasizing that these components are now swappable. In most scenarios, you can substitute one orchestrator for another.

See the updated documentation here.

What’s next?¶

Orchestrators are, at their core, meant to remain top-level components. While we’ve made strides in standardization, there’s still room for improvement. For instance, we’re planning to standardize the PromptSendingOrchestrator in a similar way (including updating its naming for consistency). And we’ve opened a few issues for feature parity between MultiTurnOrchestrators.

Hope you enjoyed this little post. There will be more content like this coming!

References¶

Russinovich, M., Salem, A., & Eldan, R. (2024). Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack. arXiv Preprint arXiv:2404.01833. https://crescendo-the-multiturn-jailbreak.github.io/
Mehrotra, A., Zampetakis, M., Kassianik, P., Nelson, B., Anderson, H., Singer, Y., & Karbasi, A. (2023). Tree of Attacks: Jailbreaking Black-Box LLMs Automatically. arXiv Preprint arXiv:2312.02119. https://arxiv.org/abs/2312.02119
Chao, P., Robey, A., Dobriban, E., Hassani, H., Pappas, G. J., & Wong, E. (2023). Jailbreaking Black Box Large Language Models in Twenty Queries. arXiv Preprint arXiv:2310.08419. https://arxiv.org/abs/2310.08419