Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Multi-Turn orchestrators

03 Dec 2024 - Rich Lundeen

In PyRIT, orchestrators are typically seen as the top-level component. This is where your attack logic is implemented, while notebooks should primarily be used to configure orchestrators.

Over time, certain patterns have emerged—one of the most common being the multi-turn scenario.

The problem

If you look at some of the code from release 0.4.0 in August, you may notice some weirdness.

The Red Teaming Orchestrator, Crescendo Russinovich et al., 2024, TAP Mehrotra et al., 2023, and PAIR Chao et al., 2023 all follow a similar setup: you configure your attack LLM, scorer, and target, then send prompts to achieve an objective. However, their implementation details vary.

In red teaming orchestrator, you had an attack_strategy instead of an objective (what even is an attack_strategy?).

alt text

PAIR had code that was max_conversation_depth

alt text

which was the same as Crescendo’s max_rounds which was also passed as part of the apply_crescendo_attack_async method vs the init for PAIR. And also Crescendo returned a score, while the others mostly returned a conversation?

alt text

There were also missed opportunities for code reuse. For instance, functions like print_conversation (sometimes named print) performed almost identical tasks but required separate implementations due to differing class structures.

There were some standardization efforts: the target being attacked was consistently named prompt_target, and the attacker LLM was always called red_teaming_chat. However, because these attacks were implemented by different contributors, the overall experience felt fragmented for users.

Let’s make it better

Can we standardize?

It turns out, yes, we can. CrescendoOrchestrator, PairOrchestrator, RedTeamingOrchestrator, TreeOfAttacksWithPruningOrchestrator are all now subclasses of MultiTurnOrchestrator. Here is what that means.

We hope these changes make orchestrators significantly easier to use. With the updated documentation, the “Red Teaming Orchestrator” has been renamed “Multi-Turn Orchestrator,” emphasizing that these components are now swappable. In most scenarios, you can substitute one orchestrator for another.

See the updated documentation here.

What’s next?

Orchestrators are, at their core, meant to remain top-level components. While we’ve made strides in standardization, there’s still room for improvement. For instance, we’re planning to standardize the PromptSendingOrchestrator in a similar way (including updating its naming for consistency). And we’ve opened a few issues for feature parity between MultiTurnOrchestrators.

Hope you enjoyed this little post. There will be more content like this coming!

References
  1. Russinovich, M., Salem, A., & Eldan, R. (2024). Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack. arXiv Preprint arXiv:2404.01833. https://crescendo-the-multiturn-jailbreak.github.io/
  2. Mehrotra, A., Zampetakis, M., Kassianik, P., Nelson, B., Anderson, H., Singer, Y., & Karbasi, A. (2023). Tree of Attacks: Jailbreaking Black-Box LLMs Automatically. arXiv Preprint arXiv:2312.02119. https://arxiv.org/abs/2312.02119
  3. Chao, P., Robey, A., Dobriban, E., Hassani, H., Pappas, G. J., & Wong, E. (2023). Jailbreaking Black Box Large Language Models in Twenty Queries. arXiv Preprint arXiv:2310.08419. https://arxiv.org/abs/2310.08419