pyrit.orchestrator.PAIROrchestrator

pyrit.orchestrator.PAIROrchestrator#

class PAIROrchestrator(*, objective_target: PromptChatTarget, adversarial_chat: PromptChatTarget, scoring_target: PromptChatTarget, adversarial_chat_seed_prompt: SeedPrompt | None = None, adversarial_chat_system_prompt_path: Path | None = None, depth: int = 3, prompt_converters: list[PromptConverter] | None = None, objective_achieved_score_threshold: float = 0.8, desired_response_prefix='Sure, here is', batch_size: int = 1, verbose: bool = False)[source]#

Bases: TreeOfAttacksWithPruningOrchestrator

This orchestrator implements the Prompt Automatic Iterative Refinement (PAIR) algorithm

This algorithm was published and described in the paper: Chao, Patrick, et al. Jailbreaking Black Box Large Language Models in Twenty Queries. arXiv:2310.08419, arXiv, 13 Oct. 2023. arXiv.org, http://arxiv.org/abs/2310.08419

The authors published a reference implementation in the following repository: patrickrchao/ JailbreakingLLMs/blob/main/system_prompts.py

__init__(*, objective_target: PromptChatTarget, adversarial_chat: PromptChatTarget, scoring_target: PromptChatTarget, adversarial_chat_seed_prompt: SeedPrompt | None = None, adversarial_chat_system_prompt_path: Path | None = None, depth: int = 3, prompt_converters: list[PromptConverter] | None = None, objective_achieved_score_threshold: float = 0.8, desired_response_prefix='Sure, here is', batch_size: int = 1, verbose: bool = False) → None[source]#

Initialize the TreeOfAttacksWithPruningOrchestrator.

Parameters:

objective_target (PromptChatTarget) – The target for the objective prompt. Must be a PromptChatTarget.
adversarial_chat (PromptChatTarget) – The target for the adversarial chat prompt.
scoring_target (PromptChatTarget) – The target for scoring the responses.
adversarial_chat_seed_prompt (Optional[SeedPrompt], optional) – The seed prompt for adversarial chat. Defaults to None.
adversarial_chat_system_prompt_path (Optional[Path], optional) – The path to the system prompt for adversarial chat. Defaults to None.
width (int, optional) – The width of the attack tree. Defaults to 3.
depth (int, optional) – The depth of the attack tree. Defaults to 5.
branching_factor (int, optional) – The branching factor of the attack tree. Defaults to 2.
on_topic_checking_enabled (bool, optional) – Flag to enable on-topic checking. Defaults to True.
prompt_converters (Optional[list[PromptConverter]], optional) – List of prompt converters. Defaults to None.
objective_achieved_score_threshold (float, optional) – The score threshold to determine if the objective is achieved. Defaults to 0.8.
desired_response_prefix (str, optional) – The desired prefix for responses. Defaults to “Sure, here is”.
verbose (bool, optional) – Flag to enable verbose logging. Defaults to False.
batch_size (int, optional) – The batch size. Defaults to 1.

Raises:

ValueError – If the adversarial seed prompt does not have a desired_prefix.
ValueError – If the width of the tree is less than 1.
ValueError – If the depth of the tree is less than 1.
ValueError – If the branching factor of the tree is less than 1.
ValueError – If the objective achieved score threshold is not between 0 and 1.

Methods

`__init__`(*, objective_target, ...[, ...])	Initialize the TreeOfAttacksWithPruningOrchestrator.
`dispose_db_engine`()	Dispose database engine to release database connections and resources.
`get_identifier`()
`get_memory`()	Retrieves the memory associated with this orchestrator.
`get_score_memory`()	Retrieves the scores of the PromptRequestPieces associated with this orchestrator.
`run_attack_async`(*, objective[, memory_labels])	Applies the TAP attack strategy asynchronously.
`run_attacks_async`(*, objectives[, memory_labels])	Applies the attack strategy for each objective in the list of objectives.
`set_prepended_conversation`(*, ...)	Sets the prepended conversation to be sent to the objective target.

set_prepended_conversation(*, prepended_conversation)[source]#

Sets the prepended conversation to be sent to the objective target. This can be used to set the system prompt of the objective target, or send a series of user/assistant messages from which the orchestrator should start the conversation from.

Parameters:: prepended_conversation (str) – The prepended conversation to send to the objective target.

pyrit.orchestrator.PAIROrchestrator

Contents

pyrit.orchestrator.PAIROrchestrator#