pyrit.executor.attack.RedTeamingAttack

pyrit.executor.attack.RedTeamingAttack#

class RedTeamingAttack(*, objective_target: PromptTarget = REQUIRED_VALUE, attack_adversarial_config: AttackAdversarialConfig, attack_converter_config: AttackConverterConfig | None = None, attack_scoring_config: AttackScoringConfig | None = None, prompt_normalizer: PromptNormalizer | None = None, max_turns: int = 10, score_last_turn_only: bool = False)[source]#

Bases: MultiTurnAttackStrategy[MultiTurnAttackContext, AttackResult]

Implementation of multi-turn red teaming attack strategy.

This class orchestrates an iterative attack process where an adversarial chat model generates prompts to send to a target system, attempting to achieve a specified objective. The strategy evaluates each target response using a scorer to determine if the objective has been met.

The attack flow consists of: 1. Generating adversarial prompts based on previous responses and scoring feedback. 2. Sending prompts to the target system through optional converters. 3. Scoring target responses to assess objective achievement. 4. Using scoring feedback to guide subsequent prompt generation. 5. Continuing until the objective is achieved or maximum turns are reached.

The strategy supports customization through system prompts, seed prompts, and prompt converters, allowing for various attack techniques and scenarios.

__init__(*, objective_target: PromptTarget = REQUIRED_VALUE, attack_adversarial_config: AttackAdversarialConfig, attack_converter_config: AttackConverterConfig | None = None, attack_scoring_config: AttackScoringConfig | None = None, prompt_normalizer: PromptNormalizer | None = None, max_turns: int = 10, score_last_turn_only: bool = False)[source]#

Initialize the red teaming attack strategy.

Parameters:

objective_target – The target system to attack.
attack_adversarial_config – Configuration for the adversarial component.
attack_converter_config – Configuration for attack converters. Defaults to None.
attack_scoring_config – Configuration for attack scoring. Defaults to None.
prompt_normalizer – The prompt normalizer to use for sending prompts. Defaults to None.
max_turns (int) – Maximum number of turns for the attack. Defaults to 10.
score_last_turn_only (bool) – If True, only score the final turn instead of every turn. This reduces LLM calls when intermediate scores are not needed (e.g., for generating simulated conversations). The attack will run for exactly max_turns when this is enabled. Defaults to False.

Raises:

ValueError – If objective_scorer is not provided in attack_scoring_config.

Methods

`__init__`(*[, objective_target, ...])	Initialize the red teaming attack strategy.
`execute_async`(**kwargs)	Execute the attack strategy asynchronously with the provided parameters.
`execute_with_context_async`(*, context)	Execute strategy with complete lifecycle management.
`get_attack_scoring_config`()	Get the attack scoring configuration used by this strategy.
`get_identifier`()	Get a serializable identifier for the strategy instance.
`get_objective_target`()	Get the objective target for this attack strategy.

Attributes

`DEFAULT_ADVERSARIAL_PROMPT_IF_OBJECTIVE_TARGET_IS_BLOCKED`
`DEFAULT_ERR_MSG_IF_OBJECTIVE_TARGET_HAS_NON_TEXT_RESPONSE`
`params_type`	Get the parameters type for this attack strategy.

DEFAULT_ADVERSARIAL_PROMPT_IF_OBJECTIVE_TARGET_IS_BLOCKED = 'Request to target failed: blocked. Please rewrite your prompt to avoid getting blocked next time.'#

DEFAULT_ERR_MSG_IF_OBJECTIVE_TARGET_HAS_NON_TEXT_RESPONSE = 'The attack target does not respond with text output, so the scoring rationale is the only textual feedback that can be passed to the red teaming chat. '#

get_attack_scoring_config() → AttackScoringConfig | None[source]#

Get the attack scoring configuration used by this strategy.

Returns:

The scoring configuration with objective scorer,: use_score_as_feedback, and threshold.

Return type:

Optional[AttackScoringConfig]

pyrit.executor.attack.RedTeamingAttack

Contents

pyrit.executor.attack.RedTeamingAttack#