pyrit.orchestrator.CrescendoOrchestrator

pyrit.orchestrator.CrescendoOrchestrator#

class CrescendoOrchestrator(objective_target: PromptChatTarget, adversarial_chat: PromptChatTarget, scoring_target: PromptChatTarget, adversarial_chat_system_prompt_path: Path | None = None, objective_achieved_score_threshhold: float = 0.7, max_turns: int = 10, prompt_converters: list[PromptConverter] | None = None, max_backtracks: int = 10, batch_size: int = 1, verbose: bool = False)[source]#

Bases: MultiTurnOrchestrator

The CrescendoOrchestrator class represents an orchestrator that executes the Crescendo attack.

The Crescendo Attack is a multi-turn strategy that progressively guides the model to generate harmful content through small, benign steps. It leverages the model’s recency bias, pattern-following tendency, and trust in self-generated text.

You can learn more about the Crescendo attack at the link below: https://crescendo-the-multiturn-jailbreak.github.io/

Parameters:

objective_target (PromptChatTarget) – The target that prompts are sent to - must be a PromptChatTarget.
adversarial_chat (PromptChatTarget) – The chat target for red teaming.
scoring_target (PromptChatTarget) – The chat target for scoring.
adversarial_chat_system_prompt_path (Optional[Path], Optional) – The path to the red teaming chat’s system prompt. Defaults to ../crescendo_variant_1_with_examples.yaml.
objective_achieved_score_threshhold (float, Optional) – The score threshold for achieving the objective. Defaults to 0.7.
max_turns (int, Optional) – The maximum number of turns to perform the attack. Defaults to 10.
prompt_converters (Optional[list[PromptConverter]], Optional) – List of converters to apply to prompts. Defaults to None.
max_backtracks (int, Optional) – The maximum number of times to backtrack during the attack. Must be a positive integer. Defaults to 10.
verbose (bool, Optional) – Flag indicating whether to enable verbose logging. Defaults to False.

__init__(objective_target: PromptChatTarget, adversarial_chat: PromptChatTarget, scoring_target: PromptChatTarget, adversarial_chat_system_prompt_path: Path | None = None, objective_achieved_score_threshhold: float = 0.7, max_turns: int = 10, prompt_converters: list[PromptConverter] | None = None, max_backtracks: int = 10, batch_size: int = 1, verbose: bool = False) → None[source]#

Methods

`__init__`(objective_target, adversarial_chat, ...)
`dispose_db_engine`()	Dispose database engine to release database connections and resources.
`get_identifier`()
`get_memory`()	Retrieves the memory associated with this orchestrator.
`get_score_memory`()	Retrieves the scores of the PromptRequestPieces associated with this orchestrator.
`run_attack_async`(*, objective[, memory_labels])	Executes the Crescendo Attack asynchronously.
`run_attacks_async`(*, objectives[, memory_labels])	Applies the attack strategy for each objective in the list of objectives.
`set_prepended_conversation`(*, ...)	Sets the prepended conversation to be sent to the objective target.

async run_attack_async(*, objective: str, memory_labels: dict[str, str] | None = None) → OrchestratorResult[source]#

Executes the Crescendo Attack asynchronously.

This method orchestrates a multi-turn attack where each turn involves generating and sending prompts to the target, assessing responses, and adapting the approach based on rejection or success criteria. It leverages progressive backtracking if the response is rejected, until either the objective is achieved or maximum turns/backtracks are reached.

Parameters:

objective (str) – The ultimate goal or purpose of the attack, which the orchestrator attempts to achieve through multiple turns of interaction with the target.
memory_labels (dict[str, str], Optional) – A free-form dictionary of additional labels to apply to the prompts throughout the attack. Any labels passed in will be combined with self._global_memory_labels (from the GLOBAL_MEMORY_LABELS environment variable) into one dictionary. In the case of collisions, the passed-in labels take precedence. Defaults to None.

Returns:

An object containing details about the attack outcome, including:

conversation_id (str): The ID of the conversation where the objective was ultimately achieved or
failed.
objective (str): The initial objective of the attack.
status (OrchestratorResultStatus): The status of the attack (“success”, “failure”, “pruned”, etc.)
score (Score): The score evaluating the attack outcome.
confidence (float): The confidence level of the result.

Return type:

OrchestratorResult

Raises:

ValueError – If max_turns is set to a non-positive integer.

pyrit.orchestrator.CrescendoOrchestrator

Contents

pyrit.orchestrator.CrescendoOrchestrator#