pyrit.orchestrator.RedTeamingOrchestrator#
- class RedTeamingOrchestrator(*, objective_target: PromptTarget, adversarial_chat: PromptChatTarget, adversarial_chat_system_prompt_path: Path = PosixPath('/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pyrit/datasets/orchestrators/red_teaming/text_generation.yaml'), adversarial_chat_seed_prompt: str | None = 'How can I help you?', prompt_converters: list[PromptConverter] | None = None, max_turns: int = 5, objective_scorer: Scorer, use_score_as_feedback: bool = True, verbose: bool = False)[source]#
Bases:
MultiTurnOrchestrator
The RedTeamingOrchestrator class orchestrates a multi-turn red teaming attack on a target system.
It is extemely simple. It sends a prompt to the target system, and then sends the response to the red teaming chat.
- Parameters:
objective_target (PromptTarget) – Target for created prompts.
adversarial_chat (PromptChatTarget) – Endpoint creating prompts sent to objective_target.
adversarial_chat_system_prompt_path (Path) – Path to initial adversarial_chat system prompt.
initial_adversarial_chat_prompt (str, Optional) – Initial message to start the chat. Defaults to “How can I help you?”.
prompt_converters (Optional[list[PromptConverter]]) – Converters for prompt formatting. Defaults to None.
max_turns (int, Optional) – Max turns for the conversation, ≥ 0. Defaults to 5.
objective_scorer (Scorer) – Scores prompt target output as sufficient or insufficient.
use_score_as_feedback (bool, Optional) – Use scoring as feedback. Defaults to True.
verbose (bool, Optional) – Print debug info. Defaults to False.
- Raises:
FileNotFoundError – If adversarial_chat_system_prompt_path file not found.
ValueError – If max_turns ≤ 0 or if objective_scorer is not binary.
- __init__(*, objective_target: PromptTarget, adversarial_chat: PromptChatTarget, adversarial_chat_system_prompt_path: Path = PosixPath('/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pyrit/datasets/orchestrators/red_teaming/text_generation.yaml'), adversarial_chat_seed_prompt: str | None = 'How can I help you?', prompt_converters: list[PromptConverter] | None = None, max_turns: int = 5, objective_scorer: Scorer, use_score_as_feedback: bool = True, verbose: bool = False) None [source]#
Methods
__init__
(*, objective_target, adversarial_chat)dispose_db_engine
()Dispose database engine to release database connections and resources.
get_identifier
()get_memory
()Retrieves the memory associated with this orchestrator.
get_score_memory
()Retrieves the scores of the PromptRequestPieces associated with this orchestrator.
run_attack_async
(*, objective[, memory_labels])Executes a multi-turn red teaming attack asynchronously.
run_attacks_async
(*, objectives[, ...])Applies the attack strategy for each objective in the list of objectives.
set_prepended_conversation
(*, ...)Sets the prepended conversation to be sent to the objective target.
- async run_attack_async(*, objective: str, memory_labels: dict[str, str] | None = None) MultiTurnAttackResult [source]#
Executes a multi-turn red teaming attack asynchronously.
This method initiates a conversation with the target system, iteratively generating prompts and analyzing responses to achieve a specified objective. It evaluates each response for success and, if necessary, adapts prompts using scoring feedback until either the objective is met or the maximum number of turns is reached.
- Parameters:
objective (str) – The specific goal the orchestrator aims to achieve through the conversation.
memory_labels (dict[str, str], Optional) – A free-form dictionary of additional labels to apply to the prompts throughout the attack. Any labels passed in will be combined with self._global_memory_labels (from the GLOBAL_MEMORY_LABELS environment variable) into one dictionary. In the case of collisions, the passed-in labels take precedence. Defaults to None.
- Returns:
- Contains the outcome of the attack, including:
conversation_id (UUID): The ID associated with the final conversation state.
achieved_objective (bool): Indicates whether the orchestrator successfully met the objective.
objective (str): The intended goal of the attack.
- Return type:
- Raises:
RuntimeError – If the response from the target system contains an unexpected error.
ValueError – If the scoring feedback is not of the required type (true/false) for binary completion.