pyrit.orchestrator.FlipAttackOrchestrator#
- class FlipAttackOrchestrator(objective_target: PromptChatTarget, request_converter_configurations: list[PromptConverterConfiguration] | None = None, response_converter_configurations: list[PromptConverterConfiguration] | None = None, objective_scorer: Scorer | None = None, auxiliary_scorers: list[Scorer] | None = None, batch_size: int = 10, retries_on_objective_failure: int = 0, verbose: bool = False)[source]#
Bases:
PromptSendingOrchestrator
This orchestrator implements the Flip Attack method found here: https://arxiv.org/html/2410.02832v1.
Essentially, adds a system prompt to the beginning of the conversation to flip each word in the prompt.
- __init__(objective_target: PromptChatTarget, request_converter_configurations: list[PromptConverterConfiguration] | None = None, response_converter_configurations: list[PromptConverterConfiguration] | None = None, objective_scorer: Scorer | None = None, auxiliary_scorers: list[Scorer] | None = None, batch_size: int = 10, retries_on_objective_failure: int = 0, verbose: bool = False) None [source]#
- Parameters:
objective_target (PromptChatTarget) – The target for sending prompts.
request_converter_configurations (list[PromptConverterConfiguration], Optional) – List of prompt converters. These are stacked in order on top of the flip converter.
response_converter_configurations (list[PromptConverterConfiguration], Optional) – List of response converters.
objective_scorer (Scorer, Optional) – Scorer to use for evaluating if the objective was achieved.
auxiliary_scorers (list[Scorer], Optional) – List of additional scorers to use for each prompt request response.
batch_size (int, Optional) – The (max) batch size for sending prompts. Defaults to 10. Note: If providing max requests per minute on the objective_target, this should be set to 1 to ensure proper rate limit management.
retries_on_objective_failure (int, Optional) – Number of retries to attempt if objective fails. Defaults to 0.
verbose (bool, Optional) – Whether to log debug information. Defaults to False.
Methods
__init__
(objective_target[, ...])dispose_db_engine
()Dispose database engine to release database connections and resources.
get_identifier
()get_memory
()Retrieves the memory associated with this orchestrator.
get_score_memory
()Retrieves the scores of the PromptRequestPieces associated with this orchestrator.
run_attack_async
(*, objective[, memory_labels])Runs the attack.
run_attacks_async
(*, objectives[, memory_labels])Runs multiple attacks in parallel using batch_size.
set_skip_criteria
(*, skip_criteria[, ...])Sets the skip criteria for the orchestrator.
- async run_attack_async(*, objective: str, memory_labels: dict[str, str] | None = None) OrchestratorResult [source]#
Runs the attack.
- Parameters:
objective (str) – The objective of the attack.
seed_prompt (SeedPromptGroup, Optional) – The seed prompt group to start the conversation. By default the objective is used.
prepended_conversation (list[PromptRequestResponse], Optional) – The conversation to prepend to the attack. Sent to objective target.
memory_labels (dict[str, str], Optional) – The memory labels to use for the attack.
- async run_attacks_async(*, objectives: list[str], memory_labels: dict[str, str] | None = None) list[OrchestratorResult] [source]#
Runs multiple attacks in parallel using batch_size.
- Parameters:
objectives (list[str]) – List of objectives for the attacks.
seed_prompts (list[SeedPromptGroup], Optional) – List of seed prompt groups to start the conversations. If not provided, each objective will be used as its own seed prompt.
prepended_conversation (list[PromptRequestResponse], Optional) – The conversation to prepend to each attack.
memory_labels (dict[str, str], Optional) – The memory labels to use for the attacks.
- Returns:
List of results from each attack.
- Return type: