pyrit.executor.attack.RedTeamingAttack#

class RedTeamingAttack(*, objective_target: PromptTarget, attack_adversarial_config: AttackAdversarialConfig, attack_converter_config: AttackConverterConfig | None = None, attack_scoring_config: AttackScoringConfig | None = None, prompt_normalizer: PromptNormalizer | None = None, max_turns: int = 10)[source]#

Bases: MultiTurnAttackStrategy[MultiTurnAttackContext, AttackResult]

Implementation of multi-turn red teaming attack strategy.

This class orchestrates an iterative attack process where an adversarial chat model generates prompts to send to a target system, attempting to achieve a specified objective. The strategy evaluates each target response using a scorer to determine if the objective has been met.

The attack flow consists of: 1. Generating adversarial prompts based on previous responses and scoring feedback. 2. Sending prompts to the target system through optional converters. 3. Scoring target responses to assess objective achievement. 4. Using scoring feedback to guide subsequent prompt generation. 5. Continuing until the objective is achieved or maximum turns are reached.

The strategy supports customization through system prompts, seed prompts, and prompt converters, allowing for various attack techniques and scenarios.

__init__(*, objective_target: PromptTarget, attack_adversarial_config: AttackAdversarialConfig, attack_converter_config: AttackConverterConfig | None = None, attack_scoring_config: AttackScoringConfig | None = None, prompt_normalizer: PromptNormalizer | None = None, max_turns: int = 10)[source]#

Initialize the red teaming attack strategy.

Parameters:
  • objective_target – The target system to attack.

  • attack_adversarial_config – Configuration for the adversarial component.

  • attack_converter_config – Configuration for attack converters. Defaults to None.

  • attack_scoring_config – Configuration for attack scoring. Defaults to None.

  • prompt_normalizer – The prompt normalizer to use for sending prompts. Defaults to None.

  • max_turns – Maximum number of turns for the attack. Defaults to 10.

Raises:

ValueError – If objective_scorer is not provided in attack_scoring_config.

Methods

__init__(*, objective_target, ...[, ...])

Initialize the red teaming attack strategy.

execute_async(**kwargs)

Execute the attack strategy asynchronously with the provided parameters.

execute_with_context_async(*, context)

Execute strategy with complete lifecycle management.

get_identifier()

Attributes

DEFAULT_ADVERSARIAL_PROMPT_IF_OBJECTIVE_TARGET_IS_BLOCKED = 'Request to target failed: blocked. Please rewrite your prompt to avoid getting blocked next time.'#
DEFAULT_ERR_MSG_IF_OBJECTIVE_TARGET_HAS_NON_TEXT_RESPONSE = 'The attack target does not respond with text output, so the scoring rationale is the only textual feedback that can be passed to the red teaming chat. '#