pyrit.executor.attack.CrescendoAttack

pyrit.executor.attack.CrescendoAttack#

class CrescendoAttack(*, objective_target: PromptChatTarget = REQUIRED_VALUE, attack_adversarial_config: AttackAdversarialConfig, attack_converter_config: AttackConverterConfig | None = None, attack_scoring_config: AttackScoringConfig | None = None, prompt_normalizer: PromptNormalizer | None = None, max_backtracks: int = 10, max_turns: int = 10)[source]#

Bases: MultiTurnAttackStrategy[CrescendoAttackContext, CrescendoAttackResult]

Implementation of the Crescendo attack strategy.

The Crescendo Attack is a multi-turn strategy that progressively guides the model to generate harmful content through small, benign steps. It leverages the model’s recency bias, pattern-following tendency, and trust in self-generated text.

The attack flow consists of: 1. Generating progressively harmful prompts using an adversarial chat model. 2. Sending prompts to the target and evaluating responses for refusal. 3. Backtracking when the target refuses to respond. 4. Scoring responses to determine if the objective has been achieved. 5. Continuing until the objective is met or maximum turns/backtracks are reached.

You can learn more about the Crescendo attack at: https://crescendo-the-multiturn-jailbreak.github.io/

__init__(*, objective_target: PromptChatTarget = REQUIRED_VALUE, attack_adversarial_config: AttackAdversarialConfig, attack_converter_config: AttackConverterConfig | None = None, attack_scoring_config: AttackScoringConfig | None = None, prompt_normalizer: PromptNormalizer | None = None, max_backtracks: int = 10, max_turns: int = 10) → None[source]#

Initialize the Crescendo attack strategy.

Parameters:

objective_target (PromptChatTarget) – The target system to attack. Must be a PromptChatTarget.
attack_adversarial_config (AttackAdversarialConfig) – Configuration for the adversarial component, including the adversarial chat target and optional system prompt path.
attack_converter_config (Optional[AttackConverterConfig]) – Configuration for attack converters, including request and response converters.
attack_scoring_config (Optional[AttackScoringConfig]) – Configuration for scoring responses.
prompt_normalizer (Optional[PromptNormalizer]) – Normalizer for prompts.
max_backtracks (int) – Maximum number of backtracks allowed.
max_turns (int) – Maximum number of turns allowed.

Raises:

ValueError – If objective_target is not a PromptChatTarget.

Methods

`__init__`(*[, objective_target, ...])	Initialize the Crescendo attack strategy.
`execute_async`(**kwargs)	Execute the attack strategy asynchronously with the provided parameters.
`execute_with_context_async`(*, context)	Execute strategy with complete lifecycle management.
`get_attack_scoring_config`()	Get the attack scoring configuration used by this strategy.
`get_identifier`()	Get a serializable identifier for the strategy instance.
`get_objective_target`()	Get the objective target for this attack strategy.

Attributes

`DEFAULT_ADVERSARIAL_CHAT_SYSTEM_PROMPT_TEMPLATE_PATH`
`params_type`	Get the parameters type for this attack strategy.

DEFAULT_ADVERSARIAL_CHAT_SYSTEM_PROMPT_TEMPLATE_PATH: Path = PosixPath('/home/runner/work/PyRIT/PyRIT/pyrit/datasets/executors/crescendo/crescendo_variant_1.yaml')#

get_attack_scoring_config() → AttackScoringConfig | None[source]#

Get the attack scoring configuration used by this strategy.

Returns:

The scoring configuration with objective scorer,: auxiliary scorers, and refusal scorer.

Return type:

Optional[AttackScoringConfig]

pyrit.executor.attack.CrescendoAttack

Contents

pyrit.executor.attack.CrescendoAttack#