pyrit.orchestrator.RedTeamingOrchestrator#

class RedTeamingOrchestrator(*, objective_target: PromptTarget, adversarial_chat: PromptChatTarget, adversarial_chat_system_prompt_path: Path = PosixPath('/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pyrit/datasets/orchestrators/red_teaming/text_generation.yaml'), adversarial_chat_seed_prompt: str | None = 'How can I help you?', prompt_converters: list[PromptConverter] | None = None, max_turns: int = 5, objective_scorer: Scorer, use_score_as_feedback: bool = True, verbose: bool = False)[source]#

Bases: MultiTurnOrchestrator

The RedTeamingOrchestrator class orchestrates a multi-turn red teaming attack on a target system.

It is extemely simple. It sends a prompt to the target system, and then sends the response to the red teaming chat.

Parameters:
  • objective_target (PromptTarget) – Target for created prompts.

  • adversarial_chat (PromptChatTarget) – Endpoint creating prompts sent to objective_target.

  • adversarial_chat_system_prompt_path (Path) – Path to initial adversarial_chat system prompt.

  • initial_adversarial_chat_prompt (str, Optional) – Initial message to start the chat. Defaults to “How can I help you?”.

  • prompt_converters (Optional[list[PromptConverter]]) – Converters for prompt formatting. Defaults to None.

  • max_turns (int, Optional) – Max turns for the conversation, ≥ 0. Defaults to 5.

  • objective_scorer (Scorer) – Scores prompt target output as sufficient or insufficient.

  • use_score_as_feedback (bool, Optional) – Use scoring as feedback. Defaults to True.

  • verbose (bool, Optional) – Print debug info. Defaults to False.

Raises:
  • FileNotFoundError – If adversarial_chat_system_prompt_path file not found.

  • ValueError – If max_turns ≤ 0 or if objective_scorer is not binary.

__init__(*, objective_target: PromptTarget, adversarial_chat: PromptChatTarget, adversarial_chat_system_prompt_path: Path = PosixPath('/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pyrit/datasets/orchestrators/red_teaming/text_generation.yaml'), adversarial_chat_seed_prompt: str | None = 'How can I help you?', prompt_converters: list[PromptConverter] | None = None, max_turns: int = 5, objective_scorer: Scorer, use_score_as_feedback: bool = True, verbose: bool = False) None[source]#

Methods

__init__(*, objective_target, adversarial_chat)

dispose_db_engine()

Dispose database engine to release database connections and resources.

get_identifier()

get_memory()

Retrieves the memory associated with this orchestrator.

get_score_memory()

Retrieves the scores of the PromptRequestPieces associated with this orchestrator.

run_attack_async(*, objective[, memory_labels])

Executes a multi-turn red teaming attack asynchronously.

run_attacks_async(*, objectives[, ...])

Applies the attack strategy for each objective in the list of objectives.

set_prepended_conversation(*, ...)

Sets the prepended conversation to be sent to the objective target.

async run_attack_async(*, objective: str, memory_labels: dict[str, str] | None = None) MultiTurnAttackResult[source]#

Executes a multi-turn red teaming attack asynchronously.

This method initiates a conversation with the target system, iteratively generating prompts and analyzing responses to achieve a specified objective. It evaluates each response for success and, if necessary, adapts prompts using scoring feedback until either the objective is met or the maximum number of turns is reached.

Parameters:
  • objective (str) – The specific goal the orchestrator aims to achieve through the conversation.

  • memory_labels (dict[str, str], Optional) – A free-form dictionary of additional labels to apply to the prompts throughout the attack. Any labels passed in will be combined with self._global_memory_labels (from the GLOBAL_MEMORY_LABELS environment variable) into one dictionary. In the case of collisions, the passed-in labels take precedence. Defaults to None.

Returns:

Contains the outcome of the attack, including:
  • conversation_id (UUID): The ID associated with the final conversation state.

  • achieved_objective (bool): Indicates whether the orchestrator successfully met the objective.

  • objective (str): The intended goal of the attack.

Return type:

MultiTurnAttackResult

Raises:
  • RuntimeError – If the response from the target system contains an unexpected error.

  • ValueError – If the scoring feedback is not of the required type (true/false) for binary completion.