pyrit.orchestrator.FlipAttackOrchestrator#

class FlipAttackOrchestrator(prompt_target: PromptTarget, scorers: list[Scorer] | None = None, batch_size: int = 10, verbose: bool = False)[source]#

Bases: PromptSendingOrchestrator

This orchestrator implements the Flip Attack method found here: https://arxiv.org/html/2410.02832v1.

Essentially, adds a system prompt to the beginning of the conversation to flip each word in the prompt.

__init__(prompt_target: PromptTarget, scorers: list[Scorer] | None = None, batch_size: int = 10, verbose: bool = False) None[source]#
Parameters:
  • prompt_target (PromptTarget) – The target for sending prompts.

  • scorers (list[Scorer], Optional) – List of scorers to use for each prompt request response, to be scored immediately after receiving response. Default is None.

  • batch_size (int, Optional) – The (max) batch size for sending prompts. Defaults to 10. Note: If providing max requests per minute on the prompt_target, this should be set to 1 to ensure proper rate limit management. verbose (bool, Optional): Whether to log debug information. Defaults to False.

Methods

__init__(prompt_target[, scorers, ...])

dispose_db_engine()

Dispose database engine to release database connections and resources.

get_identifier()

get_memory()

Retrieves the memory associated with this orchestrator.

get_score_memory()

Retrieves the scores of the PromptRequestPieces associated with this orchestrator.

print_conversations()

Prints the conversation between the prompt target and the red teaming bot.

send_normalizer_requests_async(*, ...[, ...])

Sends the normalized prompts to the prompt target.

send_prompts_async(*, prompt_list[, ...])

Sends the prompts to the prompt target using flip attack.

set_prepended_conversation(*, ...)

Prepends a conversation to the prompt target.

async send_prompts_async(*, prompt_list: list[str], memory_labels: dict[str, str] | None = None, metadata: str | None = None) list[PromptRequestResponse][source]#

Sends the prompts to the prompt target using flip attack.

Parameters:
  • prompt_list (list[str]) – The list of prompts to be sent.

  • memory_labels (dict[str, str], Optional) – A free-form dictionary of additional labels to apply to the prompts. Any labels passed in will be combined with self._global_memory_labels with the passed in labels taking precedence in the case of collisions. Defaults to None.

  • metadata – Any additional information to be added to the memory entry corresponding to the prompts sent.

Returns:

The responses from sending the prompts.

Return type:

list[PromptRequestResponse]