pyrit.orchestrator.MultiTurnOrchestrator#

class MultiTurnOrchestrator(*, objective_target: PromptTarget, adversarial_chat: PromptChatTarget, adversarial_chat_system_prompt_path: Path | str, adversarial_chat_seed_prompt: str | SeedPrompt = 'How can I help you?', max_turns: int = 5, prompt_converters: list[PromptConverter] | None = None, objective_scorer: Scorer, verbose: bool = False)[source]#

Bases: Orchestrator

The MultiTurnOrchestrator is an interface that coordinates attacks and conversations between a adversarial_chat target and an objective_target.

Parameters:
  • objective_target (PromptTarget) – The target to send the created prompts to.

  • adversarial_chat (PromptChatTarget) – The endpoint that creates prompts that are sent to the objective_target.

  • adversarial_chat_system_prompt_path (Path) – The initial prompt to send to adversarial_chat.

  • initial_adversarial_chat_prompt (str, Optional) – The initial prompt to start the adversarial chat. Defaults to “How can I help you?”.

  • max_turns (int, Optional) – The maximum number of turns for the conversation. Must be greater than or equal to 0. Defaults to 5.

  • prompt_converters (Optional[list[PromptConverter]], Optional) – The prompt converters to use to convert the prompts before sending them to the prompt target. Defaults to None.

  • objective_scorer (Scorer) – The scorer classifies the prompt target outputs as sufficient (True) or insufficient (False) to satisfy the objective that is specified in the attack_strategy.

  • verbose (bool, Optional) – Whether to print debug information. Defaults to False.

Raises:
  • FileNotFoundError – If the file specified by adversarial_chat_system_prompt_path does not exist.

  • ValueError – If max_turns is less than or equal to 0.

  • ValueError – If the objective_scorer is not a true/false scorer.

__init__(*, objective_target: PromptTarget, adversarial_chat: PromptChatTarget, adversarial_chat_system_prompt_path: Path | str, adversarial_chat_seed_prompt: str | SeedPrompt = 'How can I help you?', max_turns: int = 5, prompt_converters: list[PromptConverter] | None = None, objective_scorer: Scorer, verbose: bool = False) None[source]#

Methods

__init__(*, objective_target, ...[, ...])

dispose_db_engine()

Dispose database engine to release database connections and resources.

get_identifier()

get_memory()

Retrieves the memory associated with this orchestrator.

get_score_memory()

Retrieves the scores of the PromptRequestPieces associated with this orchestrator.

run_attack_async(*, objective[, memory_labels])

Applies the attack strategy until the conversation is complete or the maximum number of turns is reached.

run_attacks_async(*, objectives[, ...])

Applies the attack strategy for each objective in the list of objectives.

abstract async run_attack_async(*, objective: str, memory_labels: dict[str, str] | None = None) MultiTurnAttackResult[source]#

Applies the attack strategy until the conversation is complete or the maximum number of turns is reached.

Parameters:
  • objective (str) – The specific goal the orchestrator aims to achieve through the conversation.

  • memory_labels (dict[str, str], Optional) – A free-form dictionary of additional labels to apply to the prompts throughout the attack. Any labels passed in will be combined with self._global_memory_labels (from the GLOBAL_MEMORY_LABELS environment variable) into one dictionary. In the case of collisions, the passed-in labels take precedence. Defaults to None.

Returns:

Contains the outcome of the attack, including:
  • conversation_id (UUID): The ID associated with the final conversation state.

  • achieved_objective (bool): Indicates whether the orchestrator successfully met the objective.

  • objective (str): The intended goal of the attack.

Return type:

MultiTurnAttackResult

async run_attacks_async(*, objectives: list[str], memory_labels: dict[str, str] | None = None, batch_size=5) list[MultiTurnAttackResult][source]#

Applies the attack strategy for each objective in the list of objectives.

Parameters:
  • objectives (list[str]) – The list of objectives to apply the attack strategy.

  • memory_labels (dict[str, str], Optional) – A free-form dictionary of additional labels to apply to the prompts throughout the attack. Any labels passed in will be combined with self._global_memory_labels (from the GLOBAL_MEMORY_LABELS environment variable) into one dictionary. In the case of collisions, the passed-in labels take precedence. Defaults to None.

  • batch_size (int) – The number of objectives to process in parallel. The default value is 5.

Returns:

The list of MultiTurnAttackResults for each objective.

Return type:

list[MultiTurnAttackResult]