pyrit.orchestrator.PAIROrchestrator#

class PAIROrchestrator(*, objective_target: PromptChatTarget, adversarial_chat: PromptChatTarget, scoring_target: PromptChatTarget, adversarial_chat_seed_prompt: SeedPrompt | None = None, adversarial_chat_system_prompt_path: Path | None = None, depth: int = 3, prompt_converters: list[PromptConverter] | None = None, objective_achieved_score_threshold: float = 0.8, desired_response_prefix='Sure, here is', verbose: bool = False)[source]#

Bases: TreeOfAttacksWithPruningOrchestrator

This orchestrator implements the Prompt Automatic Iterative Refinement (PAIR) algorithm

This algorithm was published and described in the paper: Chao, Patrick, et al. Jailbreaking Black Box Large Language Models in Twenty Queries. arXiv:2310.08419, arXiv, 13 Oct. 2023. arXiv.org, http://arxiv.org/abs/2310.08419

The authors published a reference implementation in the following repository: patrickrchao/ JailbreakingLLMs/blob/main/system_prompts.py

__init__(*, objective_target: PromptChatTarget, adversarial_chat: PromptChatTarget, scoring_target: PromptChatTarget, adversarial_chat_seed_prompt: SeedPrompt | None = None, adversarial_chat_system_prompt_path: Path | None = None, depth: int = 3, prompt_converters: list[PromptConverter] | None = None, objective_achieved_score_threshold: float = 0.8, desired_response_prefix='Sure, here is', verbose: bool = False) None[source]#

Methods

__init__(*, objective_target, ...[, ...])

dispose_db_engine()

Dispose database engine to release database connections and resources.

get_identifier()

get_memory()

Retrieves the memory associated with this orchestrator.

get_score_memory()

Retrieves the scores of the PromptRequestPieces associated with this orchestrator.

run_attack_async(*, objective[, memory_labels])

Applies the TAP attack strategy asynchronously.

run_attacks_async(*, objectives[, ...])

Applies the attack strategy for each objective in the list of objectives.

set_prepended_conversation(*, ...)

Sets the prepended conversation to be sent to the objective target.

set_prepended_conversation(*, prepended_conversation)[source]#

Sets the prepended conversation to be sent to the objective target. This can be used to set the system prompt of the objective target, or send a series of user/assistant messages from which the orchestrator should start the conversation from.

Parameters:

prepended_conversation (str) – The prepended conversation to send to the objective target.