pyrit.executor.attack.FlipAttack

pyrit.executor.attack.FlipAttack#

class FlipAttack(objective_target: PromptChatTarget = REQUIRED_VALUE, attack_converter_config: AttackConverterConfig | None = None, attack_scoring_config: AttackScoringConfig | None = None, prompt_normalizer: PromptNormalizer | None = None, max_attempts_on_failure: int = 0)[source]#

Bases: PromptSendingAttack

This attack implements the FlipAttack method found here: https://arxiv.org/html/2410.02832v1.

Essentially, it adds a system prompt to the beginning of the conversation to flip each word in the prompt.

__init__(objective_target: PromptChatTarget = REQUIRED_VALUE, attack_converter_config: AttackConverterConfig | None = None, attack_scoring_config: AttackScoringConfig | None = None, prompt_normalizer: PromptNormalizer | None = None, max_attempts_on_failure: int = 0) None[source]#
Parameters:
  • objective_target (PromptChatTarget) – The target system to attack.

  • attack_converter_config (AttackConverterConfig, Optional) – Configuration for the prompt converters.

  • attack_scoring_config (AttackScoringConfig, Optional) – Configuration for scoring components.

  • prompt_normalizer (PromptNormalizer, Optional) – Normalizer for handling prompts.

  • max_attempts_on_failure (int, Optional) – Maximum number of attempts to retry on failure.

Methods

__init__([objective_target, ...])

execute_async(**kwargs)

Execute the attack strategy asynchronously with the provided parameters.

execute_with_context_async(*, context)

Execute strategy with complete lifecycle management.

get_attack_scoring_config()

Get the attack scoring configuration used by this strategy.

get_identifier()

get_objective_target()

Get the objective target for this attack strategy.