pyrit.orchestrator.TreeOfAttacksWithPruningOrchestrator#

class TreeOfAttacksWithPruningOrchestrator(*, objective_target: PromptChatTarget, adversarial_chat: PromptChatTarget, scoring_target: PromptChatTarget, adversarial_chat_seed_prompt: SeedPrompt | None = None, adversarial_chat_system_prompt_path: Path | None = None, width: int = 3, depth: int = 5, branching_factor: int = 2, on_topic_checking_enabled: bool = True, prompt_converters: list[PromptConverter] | None = None, objective_achieved_score_threshold: float = 0.8, desired_response_prefix='Sure, here is', verbose: bool = False)[source]#

Bases: MultiTurnOrchestrator

TreeOfAttacksWithPruningOrchestrator follows the TAP alogrithm to attack a chat target.

Parameters:
  • objective_target (PromptChatTarget) – The target for the objective prompt. Must be a PromptChatTarget.

  • adversarial_chat (PromptChatTarget) – The target for the adversarial chat prompt.

  • scoring_target (PromptChatTarget) – The target for scoring the responses.

  • adversarial_chat_seed_prompt (Optional[SeedPrompt], optional) – The seed prompt for adversarial chat. Defaults

  • None. (chat. Defaults to)

  • adversarial_chat_system_prompt_path (Optional[Path], optional) – The path to the system prompt for adversarial

  • None.

  • width (int, optional) – The width of the attack tree. Defaults to 3.

  • depth (int, optional) – The depth of the attack tree. Defaults to 5.

  • branching_factor (int, optional) – The branching factor of the attack tree. Defaults to 2.

  • on_topic_checking_enabled (bool, optional) – Flag to enable on-topic checking. Defaults to True.

  • prompt_converters (Optional[list[PromptConverter]], optional) – List of prompt converters. Defaults to None.

  • objective_achieved_score_threshold (float, optional) – The score threshold to determine if the objective is

  • 0.8. (achieved. Defaults to)

  • desired_response_prefix (str, optional) – The desired prefix for responses. Defaults to “Sure, here is”.

  • verbose (bool, optional) – Flag to enable verbose logging. Defaults to False.

Raises:

ValueError – If the adversarial seed prompt does not have a desired_prefix. ValueError: If the width of the tree is less than 1. ValueError: If the depth of the tree is less than 1. ValueError: If the branching factor of the tree is less than 1. ValueError: If the objective achieved score threshold is not between 0 and 1.

__init__(*, objective_target: PromptChatTarget, adversarial_chat: PromptChatTarget, scoring_target: PromptChatTarget, adversarial_chat_seed_prompt: SeedPrompt | None = None, adversarial_chat_system_prompt_path: Path | None = None, width: int = 3, depth: int = 5, branching_factor: int = 2, on_topic_checking_enabled: bool = True, prompt_converters: list[PromptConverter] | None = None, objective_achieved_score_threshold: float = 0.8, desired_response_prefix='Sure, here is', verbose: bool = False) None[source]#

Methods

__init__(*, objective_target, ...[, ...])

dispose_db_engine()

Dispose database engine to release database connections and resources.

get_identifier()

get_memory()

Retrieves the memory associated with this orchestrator.

get_score_memory()

Retrieves the scores of the PromptRequestPieces associated with this orchestrator.

run_attack_async(*, objective[, memory_labels])

Applies the TAP attack strategy asynchronously.

run_attacks_async(*, objectives[, ...])

Applies the attack strategy for each objective in the list of objectives.

set_prepended_conversation(*, ...)

Sets the prepended conversation to be sent to the objective target.

async run_attack_async(*, objective: str, memory_labels: dict[str, str] | None = None) TAPAttackResult[source]#

Applies the TAP attack strategy asynchronously.

Parameters:
  • objective (str) – The specific goal the orchestrator aims to achieve through the conversation.

  • memory_labels (dict[str, str], Optional) – A free-form dictionary of additional labels to apply to the prompts throughout the attack. Any labels passed in will be combined with self._global_memory_labels (from the GLOBAL_MEMORY_LABELS environment variable) into one dictionary. In the case of collisions, the passed-in labels take precedence. Defaults to None.

Returns:

Contains the outcome of the attack, including:
  • conversation_id (UUID): The ID associated with the final conversation state.

  • achieved_objective (bool): Indicates whether the orchestrator successfully met the objective.

  • objective (str): The intended goal of the attack.

Return type:

MultiTurnAttackResult

set_prepended_conversation(*, prepended_conversation)[source]#

Sets the prepended conversation to be sent to the objective target. This can be used to set the system prompt of the objective target, or send a series of user/assistant messages from which the orchestrator should start the conversation from.

Parameters:

prepended_conversation (str) – The prepended conversation to send to the objective target.