pyrit.orchestrator.ManyShotJailbreakOrchestrator#

class ManyShotJailbreakOrchestrator(**kwargs)[source]#

Bases: PromptSendingOrchestrator

Warning

ManyShotJailbreakOrchestrator is deprecated and will be removed in v0.12.0; use pyrit.executor.attack.ManyShotJailbreakAttack instead.

This orchestrator implements the Many Shot Jailbreak method as discussed in research found here: https://www.anthropic.com/research/many-shot-jailbreaking

Prepends the seed prompt with a faux dialogue between a human and an AI, using examples from a dataset to demonstrate successful jailbreaking attempts. This method leverages the model’s ability to learn from examples to bypass safety measures.

__init__(objective_target: PromptTarget, request_converter_configurations: list[PromptConverterConfiguration] | None = None, response_converter_configurations: list[PromptConverterConfiguration] | None = None, objective_scorer: Scorer | None = None, auxiliary_scorers: list[Scorer] | None = None, batch_size: int = 10, retries_on_objective_failure: int = 0, verbose: bool = False, example_count: int = 100, many_shot_examples: list[dict[str, str]] | None = None) None[source]#
Parameters:
  • objective_target (PromptTarget) – The target for sending prompts.

  • request_converter_configurations (list[PromptConverterConfiguration], Optional) – List of prompt converters.

  • response_converter_configurations (list[PromptConverterConfiguration], Optional) – List of response converters.

  • objective_scorer (Scorer, Optional) – Scorer to use for evaluating if the objective was achieved.

  • auxiliary_scorers (list[Scorer], Optional) – List of additional scorers to use for each prompt request response.

  • batch_size (int, Optional) – The (max) batch size for sending prompts. Defaults to 10. Note: If providing max requests per minute on the prompt_target, this should be set to 1 to ensure proper rate limit management.

  • retries_on_objective_failure (int, Optional) – Number of retries to attempt if objective fails. Defaults to 0.

  • verbose (bool, Optional) – Whether to log debug information. Defaults to False.

  • example_count (int) – The number of examples to include from the examples dataset. Defaults to the first 100.

  • many_shot_examples (list[dict[str, str]], Optional) – The many shot jailbreaking examples to use. If not provided, takes the first example_count examples from Many Shot Jailbreaking dataset.

Methods

__init__(objective_target[, ...])

dispose_db_engine()

Dispose database engine to release database connections and resources.

get_identifier()

get_memory()

Retrieves the memory associated with this orchestrator.

get_score_memory()

Retrieves the scores of the PromptRequestPieces associated with this orchestrator.

run_attack_async(*, objective[, memory_labels])

Runs the attack.

run_attacks_async(*, objectives[, memory_labels])

Runs multiple attacks in parallel using batch_size.

set_skip_criteria(*, skip_criteria[, ...])

Sets the skip criteria for the orchestrator.

async run_attack_async(*, objective: str, memory_labels: dict[str, str] | None = None) OrchestratorResult[source]#

Runs the attack.

Parameters:
  • objective (str) – The objective of the attack.

  • seed_prompt (SeedPromptGroup, Optional) – The seed prompt group to start the conversation. By default the objective is used.

  • prepended_conversation (list[PromptRequestResponse], Optional) – The conversation to prepend to the attack. Sent to objective target.

  • memory_labels (dict[str, str], Optional) – The memory labels to use for the attack.

async run_attacks_async(*, objectives: list[str], memory_labels: dict[str, str] | None = None) list[OrchestratorResult][source]#

Runs multiple attacks in parallel using batch_size.

Parameters:
  • objectives (list[str]) – List of objectives for the attacks.

  • seed_prompts (list[SeedPromptGroup], Optional) – List of seed prompt groups to start the conversations. If not provided, each objective will be used as its own seed prompt.

  • prepended_conversation (list[PromptRequestResponse], Optional) – The conversation to prepend to each attack.

  • memory_labels (dict[str, str], Optional) – The memory labels to use for the attacks.

Returns:

List of results from each attack.

Return type:

list[OrchestratorResult]