pyrit.orchestrator.RedTeamingOrchestrator#

class RedTeamingOrchestrator(**kwargs)[source]#

Bases: MultiTurnOrchestrator

Warning

RedTeamingOrchestrator is deprecated and will be removed in v0.12.0; use pyrit.attacks.RedTeamingAttack instead.

The RedTeamingOrchestrator class orchestrates a multi-turn red teaming attack on a target system.

It is extemely simple. It sends a prompt to the target system, and then sends the response to the red teaming chat.

Parameters:
  • objective_target (PromptTarget) – Target for created prompts.

  • adversarial_chat (PromptChatTarget) – Endpoint creating prompts sent to objective_target.

  • adversarial_chat_system_prompt_path (Path) – Path to initial adversarial_chat system prompt.

  • adversarial_chat_seed_prompt (str) – Initial seed prompt for the adversarial chat.

  • initial_adversarial_chat_prompt (str, Optional) – Initial message to start the chat. Defaults to “How can I help you?”.

  • prompt_converters (Optional[list[PromptConverter]]) – Converters for prompt formatting. Defaults to None.

  • max_turns (int, Optional) – Max turns for the conversation, ≥ 0. Defaults to 5.

  • objective_scorer (Scorer) – Scores prompt target output as sufficient or insufficient.

  • use_score_as_feedback (bool, Optional) – Use scoring as feedback. Defaults to True.

  • verbose (bool, Optional) – Print debug info. Defaults to False.

Raises:
  • FileNotFoundError – If adversarial_chat_system_prompt_path file not found.

  • ValueError – If max_turns ≤ 0 or if objective_scorer is not binary.

__init__(*, objective_target: PromptTarget, adversarial_chat: PromptChatTarget, adversarial_chat_system_prompt_path: Path = PosixPath('/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/pyrit/datasets/orchestrators/red_teaming/text_generation.yaml'), adversarial_chat_seed_prompt: str = 'How can I help you?', prompt_converters: list[PromptConverter] | None = None, max_turns: int = 5, objective_scorer: Scorer, use_score_as_feedback: bool = True, batch_size: int = 1, verbose: bool = False) None[source]#

Methods

__init__(*, objective_target, adversarial_chat)

dispose_db_engine()

Dispose database engine to release database connections and resources.

get_identifier()

get_memory()

Retrieves the memory associated with this orchestrator.

get_score_memory()

Retrieves the scores of the PromptRequestPieces associated with this orchestrator.

run_attack_async(*, objective[, memory_labels])

Applies the attack strategy until the conversation is complete or the maximum number of turns is reached.

run_attacks_async(*, objectives[, memory_labels])

Applies the attack strategy for each objective in the list of objectives.

set_prepended_conversation(*, ...)

Sets the prepended conversation to be sent to the objective target.

async run_attack_async(*, objective: str, memory_labels: dict[str, str] | None = None) OrchestratorResult[source]#

Applies the attack strategy until the conversation is complete or the maximum number of turns is reached.

Parameters:
  • objective (str) – The specific goal the orchestrator aims to achieve through the conversation.

  • memory_labels (dict[str, str], Optional) – A free-form dictionary of additional labels to apply to the prompts throughout the attack. Any labels passed in will be combined with self._global_memory_labels (from the GLOBAL_MEMORY_LABELS environment variable) into one dictionary. In the case of collisions, the passed-in labels take precedence. Defaults to None.

Returns:

Contains the outcome of the attack, including:
  • conversation_id (str): The ID associated with the final conversation state.

  • objective (str): The intended goal of the attack.

  • status (OrchestratorResultStatus): The status of the attack (“success”, “failure”, “pruned”, etc.)

  • score (Score): The score evaluating the attack outcome.

  • confidence (float): The confidence level of the result.

Return type:

OrchestratorResult