pyrit.executor.attack.RedTeamingAttack#
- class RedTeamingAttack(*, objective_target: PromptTarget, attack_adversarial_config: AttackAdversarialConfig, attack_converter_config: AttackConverterConfig | None = None, attack_scoring_config: AttackScoringConfig | None = None, prompt_normalizer: PromptNormalizer | None = None, max_turns: int = 10)[source]#
Bases:
MultiTurnAttackStrategy
[MultiTurnAttackContext
,AttackResult
]Implementation of multi-turn red teaming attack strategy.
This class orchestrates an iterative attack process where an adversarial chat model generates prompts to send to a target system, attempting to achieve a specified objective. The strategy evaluates each target response using a scorer to determine if the objective has been met.
The attack flow consists of: 1. Generating adversarial prompts based on previous responses and scoring feedback. 2. Sending prompts to the target system through optional converters. 3. Scoring target responses to assess objective achievement. 4. Using scoring feedback to guide subsequent prompt generation. 5. Continuing until the objective is achieved or maximum turns are reached.
The strategy supports customization through system prompts, seed prompts, and prompt converters, allowing for various attack techniques and scenarios.
- __init__(*, objective_target: PromptTarget, attack_adversarial_config: AttackAdversarialConfig, attack_converter_config: AttackConverterConfig | None = None, attack_scoring_config: AttackScoringConfig | None = None, prompt_normalizer: PromptNormalizer | None = None, max_turns: int = 10)[source]#
Initialize the red teaming attack strategy.
- Parameters:
objective_target – The target system to attack.
attack_adversarial_config – Configuration for the adversarial component.
attack_converter_config – Configuration for attack converters. Defaults to None.
attack_scoring_config – Configuration for attack scoring. Defaults to None.
prompt_normalizer – The prompt normalizer to use for sending prompts. Defaults to None.
max_turns – Maximum number of turns for the attack. Defaults to 10.
- Raises:
ValueError – If objective_scorer is not provided in attack_scoring_config.
Methods
__init__
(*, objective_target, ...[, ...])Initialize the red teaming attack strategy.
execute_async
(**kwargs)Execute the attack strategy asynchronously with the provided parameters.
execute_with_context_async
(*, context)Execute strategy with complete lifecycle management.
get_identifier
()Attributes
- DEFAULT_ADVERSARIAL_PROMPT_IF_OBJECTIVE_TARGET_IS_BLOCKED = 'Request to target failed: blocked. Please rewrite your prompt to avoid getting blocked next time.'#
- DEFAULT_ERR_MSG_IF_OBJECTIVE_TARGET_HAS_NON_TEXT_RESPONSE = 'The attack target does not respond with text output, so the scoring rationale is the only textual feedback that can be passed to the red teaming chat. '#