pyrit.orchestrator.FlipAttackOrchestrator#
- class FlipAttackOrchestrator(prompt_target: PromptTarget, scorers: list[Scorer] | None = None, batch_size: int = 10, verbose: bool = False)[source]#
Bases:
PromptSendingOrchestrator
This orchestrator implements the Flip Attack method found here: https://arxiv.org/html/2410.02832v1.
Essentially, adds a system prompt to the beginning of the conversation to flip each word in the prompt.
- __init__(prompt_target: PromptTarget, scorers: list[Scorer] | None = None, batch_size: int = 10, verbose: bool = False) None [source]#
- Parameters:
prompt_target (PromptTarget) – The target for sending prompts.
scorers (list[Scorer], Optional) – List of scorers to use for each prompt request response, to be scored immediately after receiving response. Default is None.
batch_size (int, Optional) – The (max) batch size for sending prompts. Defaults to 10. Note: If providing max requests per minute on the prompt_target, this should be set to 1 to ensure proper rate limit management. verbose (bool, Optional): Whether to log debug information. Defaults to False.
Methods
__init__
(prompt_target[, scorers, ...])dispose_db_engine
()Dispose database engine to release database connections and resources.
get_identifier
()get_memory
()Retrieves the memory associated with this orchestrator.
get_score_memory
()Retrieves the scores of the PromptRequestPieces associated with this orchestrator.
print_conversations
()Prints the conversation between the prompt target and the red teaming bot.
send_normalizer_requests_async
(*, ...[, ...])Sends the normalized prompts to the prompt target.
send_prompts_async
(*, prompt_list[, ...])Sends the prompts to the prompt target using flip attack.
set_prepended_conversation
(*, ...)Prepends a conversation to the prompt target.
- async send_prompts_async(*, prompt_list: list[str], memory_labels: dict[str, str] | None = None, metadata: str | None = None) list[PromptRequestResponse] [source]#
Sends the prompts to the prompt target using flip attack.
- Parameters:
memory_labels (dict[str, str], Optional) – A free-form dictionary of additional labels to apply to the prompts. Any labels passed in will be combined with self._global_memory_labels with the passed in labels taking precedence in the case of collisions. Defaults to None.
metadata – Any additional information to be added to the memory entry corresponding to the prompts sent.
- Returns:
The responses from sending the prompts.
- Return type: