pyrit.executor.attack.ManyShotJailbreakAttack#
- class ManyShotJailbreakAttack(objective_target: PromptTarget, attack_converter_config: AttackConverterConfig | None = None, attack_scoring_config: AttackScoringConfig | None = None, prompt_normalizer: PromptNormalizer | None = None, max_attempts_on_failure: int = 0, example_count: int = 100, many_shot_examples: list[dict[str, str]] | None = None)[source]#
Bases:
PromptSendingAttack
This attack implements implements the Many Shot Jailbreak method as discussed in research found here: https://www.anthropic.com/research/many-shot-jailbreaking
Prepends the seed prompt with a faux dialogue between a human and an AI, using examples from a dataset to demonstrate successful jailbreaking attempts. This method leverages the model’s ability to learn from examples to bypass safety measures.
- __init__(objective_target: PromptTarget, attack_converter_config: AttackConverterConfig | None = None, attack_scoring_config: AttackScoringConfig | None = None, prompt_normalizer: PromptNormalizer | None = None, max_attempts_on_failure: int = 0, example_count: int = 100, many_shot_examples: list[dict[str, str]] | None = None) None [source]#
- Parameters:
objective_target (PromptTarget) – The target system to attack.
attack_converter_config (AttackConverterConfig, Optional) – Configuration for the prompt converters.
attack_scoring_config (AttackScoringConfig, Optional) – Configuration for scoring components.
prompt_normalizer (PromptNormalizer, Optional) – Normalizer for handling prompts.
max_attempts_on_failure (int, Optional) – Maximum number of attempts to retry on failure. Defaults to 0.
example_count (int) – The number of examples to include from many_shot_examples or the Many Shot Jailbreaking dataset. Defaults to the first 100.
many_shot_examples (list[dict[str, str]], Optional) – The many shot jailbreaking examples to use. If not provided, takes the first example_count examples from Many Shot Jailbreaking dataset.
Methods
__init__
(objective_target[, ...])execute_async
(**kwargs)Execute the attack strategy asynchronously with the provided parameters.
execute_with_context_async
(*, context)Execute strategy with complete lifecycle management.
get_identifier
()