pyrit.executor.attack.TreeOfAttacksWithPruningAttack

pyrit.executor.attack.TreeOfAttacksWithPruningAttack#

class TreeOfAttacksWithPruningAttack(*, objective_target: PromptChatTarget = REQUIRED_VALUE, attack_adversarial_config: AttackAdversarialConfig, attack_converter_config: AttackConverterConfig | None = None, attack_scoring_config: AttackScoringConfig | None = None, prompt_normalizer: PromptNormalizer | None = None, tree_width: int = 3, tree_depth: int = 5, branching_factor: int = 2, on_topic_checking_enabled: bool = True, desired_response_prefix: str = 'Sure, here is', batch_size: int = 10)[source]#

Bases: AttackStrategy[TAPAttackContext, TAPAttackResult]

Implementation of the Tree of Attacks with Pruning (TAP) attack strategy.

The TAP attack strategy systematically explores multiple adversarial prompt paths in parallel using a tree structure. It employs breadth-first search with pruning to efficiently find effective jailbreaks while managing computational resources.

How it works: 1. Initialization: Creates multiple initial attack branches (width) to explore different approaches 2. Tree Expansion: For each iteration (depth), branches are expanded by a branching factor 3. Prompt Generation: Each node generates adversarial prompts via an LLM red-teaming assistant 4. Evaluation: Responses are evaluated for objective achievement and on-topic relevance 5. Pruning: Low-scoring or off-topic branches are pruned to maintain the width constraint 6. Iteration: The process continues until the objective is achieved or max depth is reached

The strategy balances exploration (trying diverse approaches) with exploitation (focusing on promising paths) through its pruning mechanism.

Example

>>> from pyrit.prompt_target import AzureOpenAIChat
>>> from pyrit.score import SelfAskScaleScorer, FloatScaleThresholdScorer
>>> from pyrit.executor.attack import (
>>>     TreeOfAttacksWithPruningAttack, AttackAdversarialConfig, AttackScoringConfig
>>> )
>>> # Initialize models
>>> target = AzureOpenAIChat(deployment_name="gpt-4", endpoint="...", api_key="...")
>>> adversarial_llm = AzureOpenAIChat(deployment_name="gpt-4", endpoint="...", api_key="...")
>>>
>>> # Configure attack
>>> tap_attack = TreeOfAttacksWithPruningAttack(
...     objective_target=target,
...     attack_adversarial_config=AttackAdversarialConfig(target=adversarial_llm),
...     attack_scoring_config=AttackScoringConfig(
...         objective_scorer=FloatScaleThresholdScorer(
...             scorer=SelfAskScaleScorer(chat_target=adversarial_llm),
...             threshold=0.7,
...         )
...     ),
...     tree_width=3,
...     tree_depth=5,
... )
>>>
>>> # Execute attack
>>> result = await tap_attack.execute_async(objective="Generate harmful content")
>>> print(f"Attack {'succeeded' if result.outcome == AttackOutcome.SUCCESS else 'failed'}")
>>> print(f"Explored {result.nodes_explored} nodes, pruned {result.nodes_pruned}")

Note

The TAP attack is particularly effective for complex objectives that benefit from systematic exploration. The tree visualization in the result provides insights into the attack’s decision-making process.

References

Tree of Attacks: Jailbreaking Black-Box LLMs Automatically https://arxiv.org/abs/2312.02119

__init__(*, objective_target: PromptChatTarget = REQUIRED_VALUE, attack_adversarial_config: AttackAdversarialConfig, attack_converter_config: AttackConverterConfig | None = None, attack_scoring_config: AttackScoringConfig | None = None, prompt_normalizer: PromptNormalizer | None = None, tree_width: int = 3, tree_depth: int = 5, branching_factor: int = 2, on_topic_checking_enabled: bool = True, desired_response_prefix: str = 'Sure, here is', batch_size: int = 10)[source]#

Initialize the Tree of Attacks with Pruning attack strategy.

Parameters:

objective_target (PromptChatTarget) – The target system to attack.
attack_adversarial_config (AttackAdversarialConfig) – Configuration for the adversarial chat component.
attack_converter_config (Optional[AttackConverterConfig]) – Configuration for attack converters. Defaults to None.
attack_scoring_config (Optional[AttackScoringConfig]) – Configuration for attack scoring. Must include objective_scorer. Defaults to None.
prompt_normalizer (Optional[PromptNormalizer]) – The prompt normalizer to use. Defaults to None.
tree_width (int) – Number of branches to explore in parallel at each level. Defaults to 3.
tree_depth (int) – Maximum number of iterations to perform. Defaults to 5.
branching_factor (int) – Number of child branches to create from each parent. Defaults to 2.
on_topic_checking_enabled (bool) – Whether to check if prompts are on-topic. Defaults to True.
desired_response_prefix (str) – Expected prefix for successful responses. Defaults to “Sure, here is”.
batch_size (int) – Number of nodes to process in parallel per batch. Defaults to 10.

Raises:

ValueError – If objective_scorer is not provided, if target is not PromptChatTarget, or if parameters are invalid.

Methods

`__init__`(*[, objective_target, ...])	Initialize the Tree of Attacks with Pruning attack strategy.
`execute_async`()	Execute the attack strategy asynchronously with the provided parameters.
`execute_with_context_async`(*, context)	Execute strategy with complete lifecycle management.
`get_attack_scoring_config`()	Get the attack scoring configuration used by this strategy.
`get_identifier`()
`get_objective_target`()	Get the objective target for this attack strategy.

Attributes

`DEFAULT_ADVERSARIAL_PROMPT_TEMPLATE_PATH`
`DEFAULT_ADVERSARIAL_SEED_PROMPT_PATH`
`DEFAULT_ADVERSARIAL_SYSTEM_PROMPT_PATH`

DEFAULT_ADVERSARIAL_PROMPT_TEMPLATE_PATH: Path = PosixPath('/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/pyrit/datasets/executors/tree_of_attacks/adversarial_prompt_template.yaml')#

DEFAULT_ADVERSARIAL_SEED_PROMPT_PATH: Path = PosixPath('/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/pyrit/datasets/executors/tree_of_attacks/adversarial_seed_prompt.yaml')#

DEFAULT_ADVERSARIAL_SYSTEM_PROMPT_PATH: Path = PosixPath('/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/pyrit/datasets/executors/tree_of_attacks/adversarial_system_prompt.yaml')#

async execute_async(*, objective: str, memory_labels: dict[str, str] | None = None, **kwargs) → TAPAttackResult[source]#
async execute_async(**kwargs) → TAPAttackResult: Execute the attack strategy asynchronously with the provided parameters.

get_attack_scoring_config() → AttackScoringConfig | None[source]#

Get the attack scoring configuration used by this strategy.

Returns:

The scoring configuration with objective scorer,: auxiliary scorers, and threshold.

Return type:

Optional[AttackScoringConfig]

pyrit.executor.attack.TreeOfAttacksWithPruningAttack

Contents

pyrit.executor.attack.TreeOfAttacksWithPruningAttack#