pyrit.executor.attack

Attack executor module.

Functions¶

`generate_simulated_conversation_async`¶

generate_simulated_conversation_async(objective: str, adversarial_chat: PromptChatTarget, objective_scorer: TrueFalseScorer, num_turns: int = 3, starting_sequence: int = 0, adversarial_chat_system_prompt_path: Union[str, Path], simulated_target_system_prompt_path: Optional[Union[str, Path]] = None, next_message_system_prompt_path: Optional[Union[str, Path]] = None, attack_converter_config: Optional[AttackConverterConfig] = None, memory_labels: Optional[dict[str, str]] = None) → list[SeedPrompt]

Generate a simulated conversation between an adversarial chat and a target.

This utility runs a RedTeamingAttack with score_last_turn_only=True against a simulated target (the same LLM as adversarial_chat, optionally configured with a system prompt). The resulting conversation is returned as a list of SeedPrompts that can be merged with other SeedPrompts in a SeedGroup for use as prepended_conversation and next_message.

Use cases:

Creating role-play scenarios dynamically (e.g., movie script, video game)
Establishing conversational context before attacking a real target
Generating multi-turn jailbreak setups without hardcoded responses

Parameter	Type	Description
`objective`	`str`	The objective for the adversarial chat to work toward.
`adversarial_chat`	`PromptChatTarget`	The adversarial LLM that generates attack prompts. This same LLM is also used as the simulated target.
`objective_scorer`	`TrueFalseScorer`	Scorer to evaluate the final turn.
`num_turns`	`int`	Number of conversation turns to generate. Defaults to 3. Defaults to `3`.
`starting_sequence`	`int`	The starting sequence number for the generated SeedPrompts. Each message gets an incrementing sequence number. Defaults to 0. Defaults to `0`.
`adversarial_chat_system_prompt_path`	`Union[str, Path]`	Path to the system prompt for the adversarial chat.
`simulated_target_system_prompt_path`	`Optional[Union[str, Path]]`	Path to the system prompt for the simulated target. If None, no system prompt is used for the simulated target. Defaults to `None`.
`next_message_system_prompt_path`	`Optional[Union[str, Path]]`	Optional path to a system prompt for generating a final user message. If provided, after the simulated conversation, a single LLM call generates a user message that attempts to get the target to fulfill the objective in their next response. The prompt template receives `objective` and `conversation_so_far` parameters. Defaults to `None`.
`attack_converter_config`	`Optional[AttackConverterConfig]`	Converter configuration for the attack. Defaults to None. Defaults to `None`.
`memory_labels`	`Optional[dict[str, str]]`	Labels to associate with the conversation in memory. Defaults to None. Defaults to `None`.

Returns:

list[SeedPrompt] — List of SeedPrompts representing the generated conversation, with sequence numbers
list[SeedPrompt] — starting from starting_sequence and incrementing by 1 for each message.
list[SeedPrompt] — User messages have role=“user”, assistant messages have role=“assistant”.
list[SeedPrompt] — If next_message_system_prompt_path is provided, the last message will be a user message
list[SeedPrompt] — generated to elicit the objective fulfillment.

Raises:

ValueError — If num_turns is not a positive integer.

`AttackAdversarialConfig`¶

Adversarial configuration for attacks that involve adversarial chat targets.

This class defines the configuration for attacks that utilize an adversarial chat target, including the target chat model, system prompt, and seed prompt for the attack.

`AttackContext`¶

Bases: StrategyContext, ABC, Generic[AttackParamsT]

Base class for all attack contexts.

This class holds both the immutable attack parameters and the mutable execution state. The params field contains caller-provided inputs, while other fields track execution progress.

Attacks that generate certain values internally (e.g., RolePlayAttack generates next_message and prepended_conversation) can set the mutable override fields (_next_message_override, _prepended_conversation_override) during _setup_async.

`AttackConverterConfig`¶

Bases: StrategyConverterConfig

Configuration for prompt converters used in attacks.

This class defines the converter configurations that transform prompts during the attack process, both for requests and responses.

`AttackExecutor`¶

Manages the execution of attack strategies with support for parallel execution.

The AttackExecutor provides controlled execution of attack strategies with concurrency limiting. It uses the attack’s params_type to create parameters from seed groups.

Constructor Parameters:

Parameter	Type	Description
`max_concurrency`	`int`	Maximum number of concurrent attack executions (default: 1). Defaults to `1`.

Methods:

`execute_attack_async`¶

execute_attack_async(attack: AttackStrategy[AttackStrategyContextT, AttackStrategyResultT], objectives: Sequence[str], field_overrides: Optional[Sequence[dict[str, Any]]] = None, return_partial_on_failure: bool = False, broadcast_fields: Any = {}) → AttackExecutorResult[AttackStrategyResultT]

Execute attacks in parallel for each objective.

Creates AttackParameters directly from objectives and field values.

Parameter	Type	Description
`attack`	`AttackStrategy[AttackStrategyContextT, AttackStrategyResultT]`	The attack strategy to execute.
`objectives`	`Sequence[str]`	List of attack objectives.
`field_overrides`	`Optional[Sequence[dict[str, Any]]]`	Optional per-objective field overrides. If provided, must match the length of objectives. Defaults to `None`.
`return_partial_on_failure`	`bool`	If True, returns partial results when some objectives fail. If False (default), raises the first exception. Defaults to `False`.
`**broadcast_fields`	`Any`	Fields applied to all objectives (e.g., memory_labels). Per-objective field_overrides take precedence. Defaults to `{}`.

Returns:

AttackExecutorResult[AttackStrategyResultT] — AttackExecutorResult with completed results and any incomplete objectives.

Raises:

ValueError — If objectives is empty or field_overrides length doesn’t match.
BaseException — If return_partial_on_failure=False and any objective fails.

`execute_attack_from_seed_groups_async`¶

execute_attack_from_seed_groups_async(attack: AttackStrategy[AttackStrategyContextT, AttackStrategyResultT], seed_groups: Sequence[SeedAttackGroup], adversarial_chat: Optional[PromptChatTarget] = None, objective_scorer: Optional[TrueFalseScorer] = None, field_overrides: Optional[Sequence[dict[str, Any]]] = None, return_partial_on_failure: bool = False, broadcast_fields: Any = {}) → AttackExecutorResult[AttackStrategyResultT]

Execute attacks in parallel, extracting parameters from SeedAttackGroups.

Uses the attack’s params_type.from_seed_group() to extract parameters, automatically handling which fields the attack accepts.

Parameter	Type	Description
`attack`	`AttackStrategy[AttackStrategyContextT, AttackStrategyResultT]`	The attack strategy to execute.
`seed_groups`	`Sequence[SeedAttackGroup]`	SeedAttackGroups containing objectives and optional prompts.
`adversarial_chat`	`Optional[PromptChatTarget]`	Optional chat target for generating adversarial prompts or simulated conversations. Required when seed groups contain SeedSimulatedConversation configurations. Defaults to `None`.
`objective_scorer`	`Optional[TrueFalseScorer]`	Optional scorer for evaluating simulated conversations. Required when seed groups contain SeedSimulatedConversation configurations. Defaults to `None`.
`field_overrides`	`Optional[Sequence[dict[str, Any]]]`	Optional per-seed-group field overrides. If provided, must match the length of seed_groups. Each dict is passed to from_seed_group() as overrides. Defaults to `None`.
`return_partial_on_failure`	`bool`	If True, returns partial results when some objectives fail. If False (default), raises the first exception. Defaults to `False`.
`**broadcast_fields`	`Any`	Fields applied to all seed groups (e.g., memory_labels). Per-seed-group field_overrides take precedence. Defaults to `{}`.

Returns:

AttackExecutorResult[AttackStrategyResultT] — AttackExecutorResult with completed results and any incomplete objectives.

Raises:

ValueError — If seed_groups is empty or field_overrides length doesn’t match.
BaseException — If return_partial_on_failure=False and any objective fails.

`execute_multi_objective_attack_async`¶

execute_multi_objective_attack_async(attack: AttackStrategy[AttackStrategyContextT, AttackStrategyResultT], objectives: list[str], prepended_conversation: Optional[list[Message]] = None, memory_labels: Optional[dict[str, str]] = None, return_partial_on_failure: bool = False, attack_params: Any = {}) → AttackExecutorResult[AttackStrategyResultT]

Execute the same attack strategy with multiple objectives against the same target in parallel.

.. deprecated:: Use :meth:execute_attack_async instead. This method will be removed in a future version.

Parameter	Type	Description
`attack`	`AttackStrategy[AttackStrategyContextT, AttackStrategyResultT]`	The attack strategy to use for all objectives.
`objectives`	`list[str]`	List of attack objectives to test.
`prepended_conversation`	`Optional[list[Message]]`	Conversation to prepend to the target model. Defaults to `None`.
`memory_labels`	`Optional[dict[str, str]]`	Additional labels that can be applied to the prompts. Defaults to `None`.
`return_partial_on_failure`	`bool`	If True, returns partial results on failure. Defaults to `False`.
`**attack_params`	`Any`	Additional parameters specific to the attack strategy. Defaults to `{}`.

Returns:

AttackExecutorResult[AttackStrategyResultT] — AttackExecutorResult with completed results and any incomplete objectives.

`execute_multi_turn_attacks_async`¶

execute_multi_turn_attacks_async(attack: AttackStrategy[_MultiTurnContextT, AttackStrategyResultT], objectives: list[str], messages: Optional[list[Message]] = None, prepended_conversations: Optional[list[list[Message]]] = None, memory_labels: Optional[dict[str, str]] = None, return_partial_on_failure: bool = False, attack_params: Any = {}) → AttackExecutorResult[AttackStrategyResultT]

Execute a batch of multi-turn attacks with multiple objectives.

.. deprecated:: Use :meth:execute_attack_async instead. This method will be removed in a future version.

Parameter	Type	Description
`attack`	`AttackStrategy[_MultiTurnContextT, AttackStrategyResultT]`	The multi-turn attack strategy to use.
`objectives`	`list[str]`	List of attack objectives to test.
`messages`	`Optional[list[Message]]`	List of messages to use for this execution (per-objective). Defaults to `None`.
`prepended_conversations`	`Optional[list[list[Message]]]`	Conversations to prepend to each objective (per-objective). Defaults to `None`.
`memory_labels`	`Optional[dict[str, str]]`	Additional labels that can be applied to the prompts. Defaults to `None`.
`return_partial_on_failure`	`bool`	If True, returns partial results on failure. Defaults to `False`.
`**attack_params`	`Any`	Additional parameters specific to the attack strategy. Defaults to `{}`.

Returns:

AttackExecutorResult[AttackStrategyResultT] — AttackExecutorResult with completed results and any incomplete objectives.

Raises:

TypeError — If the attack does not use MultiTurnAttackContext.

`execute_single_turn_attacks_async`¶

execute_single_turn_attacks_async(attack: AttackStrategy[_SingleTurnContextT, AttackStrategyResultT], objectives: list[str], messages: Optional[list[Message]] = None, prepended_conversations: Optional[list[list[Message]]] = None, memory_labels: Optional[dict[str, str]] = None, return_partial_on_failure: bool = False, attack_params: Any = {}) → AttackExecutorResult[AttackStrategyResultT]

Execute a batch of single-turn attacks with multiple objectives.

.. deprecated:: Use :meth:execute_attack_async instead. This method will be removed in a future version.

Parameter	Type	Description
`attack`	`AttackStrategy[_SingleTurnContextT, AttackStrategyResultT]`	The single-turn attack strategy to use.
`objectives`	`list[str]`	List of attack objectives to test.
`messages`	`Optional[list[Message]]`	List of messages to use for this execution (per-objective). Defaults to `None`.
`prepended_conversations`	`Optional[list[list[Message]]]`	Conversations to prepend to each objective (per-objective). Defaults to `None`.
`memory_labels`	`Optional[dict[str, str]]`	Additional labels that can be applied to the prompts. Defaults to `None`.
`return_partial_on_failure`	`bool`	If True, returns partial results on failure. Defaults to `False`.
`**attack_params`	`Any`	Additional parameters specific to the attack strategy. Defaults to `{}`.

Returns:

AttackExecutorResult[AttackStrategyResultT] — AttackExecutorResult with completed results and any incomplete objectives.

Raises:

TypeError — If the attack does not use SingleTurnAttackContext.

`AttackExecutorResult`¶

Bases: Generic[AttackResultT]

Result container for attack execution, supporting both full and partial completion.

This class holds results from parallel attack execution. It is iterable and behaves like a list in the common case where all objectives complete successfully.

When some objectives don’t complete (throw exceptions), access incomplete_objectives to retrieve the failures, or use raise_if_incomplete() to raise the first exception.

Note: “completed” means the execution finished, not that the attack objective was achieved.

Methods:

`get_results`¶

get_results() → list[AttackResultT]

Get completed results, raising if any incomplete.

Returns:

list[AttackResultT] — List of completed attack results.

`raise_if_incomplete`¶

raise_if_incomplete() → None

Raise the first exception if any objectives are incomplete.

`AttackParameters`¶

Immutable parameters for attack execution.

This class defines the standard contract for attack parameters. All attacks at a given level of the hierarchy share the same parameter signature.

Attacks that don’t accept certain parameters should use the excluding() factory to create a derived params type without those fields. Attacks that need additional parameters should extend this class with new fields.

Methods:

`excluding`¶

excluding(field_names: str = ()) → type[AttackParameters]

Create a new AttackParameters subclass that excludes the specified fields.

This factory method creates a frozen dataclass without the specified fields. The resulting class inherits the from_seed_group() behavior and will raise if excluded fields are passed as overrides.

Parameter	Type	Description
`*field_names`	`str`	Names of fields to exclude from the new params type. Defaults to `()`.

Returns:

type[AttackParameters] — A new AttackParameters subclass without the specified fields.

Raises:

ValueError — If any field_name is not a valid field of this class.

`from_seed_group_async`¶

from_seed_group_async(seed_group: SeedAttackGroup, adversarial_chat: Optional[PromptChatTarget] = None, objective_scorer: Optional[TrueFalseScorer] = None, overrides: Any = {}) → AttackParamsT

Create an AttackParameters instance from a SeedAttackGroup.

Extracts standard fields from the seed group and applies any overrides. If the seed_group has a simulated conversation config, generates the simulated conversation using the provided adversarial_chat and scorer.

Parameter	Type	Description
`seed_group`	`SeedAttackGroup`	The seed attack group to extract parameters from.
`adversarial_chat`	`Optional[PromptChatTarget]`	The adversarial chat target for generating simulated conversations. Required if seed_group has a simulated conversation config. Defaults to `None`.
`objective_scorer`	`Optional[TrueFalseScorer]`	The scorer for evaluating simulated conversations. Required if seed_group has a simulated conversation config. Defaults to `None`.
`**overrides`	`Any`	Field overrides to apply. Must be valid fields for this params type. Defaults to `{}`.

Returns:

AttackParamsT — An instance of this AttackParameters type.

Raises:

ValueError — If seed_group has no objective or if overrides contain invalid fields.
ValueError — If seed_group has simulated conversation but adversarial_chat/scorer not provided.

`AttackResultPrinter`¶

Bases: ABC

Abstract base class for printing attack results.

This interface defines the contract for printing attack results in various formats. Implementations can render results to console, logs, files, or other outputs.

Methods:

`print_conversation_async`¶

print_conversation_async(result: AttackResult, include_scores: bool = False) → None

Print only the conversation history.

Parameter	Type	Description
`result`	`AttackResult`	The attack result containing the conversation to print
`include_scores`	`bool`	Whether to include scores in the output. Defaults to False. Defaults to `False`.

`print_result_async`¶

print_result_async(result: AttackResult, include_auxiliary_scores: bool = False, include_pruned_conversations: bool = False, include_adversarial_conversation: bool = False) → None

Print the complete attack result.

Parameter	Type	Description
`result`	`AttackResult`	The attack result to print
`include_auxiliary_scores`	`bool`	Whether to include auxiliary scores in the output. Defaults to False. Defaults to `False`.
`include_pruned_conversations`	`bool`	Whether to include pruned conversations. For each pruned conversation, only the last message and its score are shown. Defaults to False. Defaults to `False`.
`include_adversarial_conversation`	`bool`	Whether to include the adversarial conversation (the red teaming LLM’s reasoning). Only shown for successful attacks to avoid overwhelming output. Defaults to False. Defaults to `False`.

`print_summary_async`¶

print_summary_async(result: AttackResult) → None

Print a summary of the attack result without the full conversation.

Parameter	Type	Description
`result`	`AttackResult`	The attack result to summarize

`AttackScoringConfig`¶

Scoring configuration for evaluating attack effectiveness.

This class defines the scoring components used to evaluate attack effectiveness, detect refusals, and perform auxiliary scoring operations.

`AttackStrategy`¶

Bases: Strategy[AttackStrategyContextT, AttackStrategyResultT], Identifiable, ABC

Abstract base class for attack strategies. Defines the interface for executing attacks and handling results.

Constructor Parameters:

Parameter	Type	Description
`objective_target`	`PromptTarget`	The target system to attack.
`context_type`	`type[AttackStrategyContextT]`	The type of context this strategy operates on.
`params_type`	`Type[AttackParamsT]`	The type of parameters this strategy accepts. Defaults to AttackParameters. Use AttackParameters.excluding() to create a params type that rejects certain fields. Defaults to `AttackParameters`.
`logger`	`logging.Logger`	Logger instance for logging events. Defaults to `logger`.

Methods:

`execute_async`¶

execute_async(kwargs: Any = {}) → AttackStrategyResultT

Execute the attack strategy asynchronously with the provided parameters.

This method provides a stable contract for all attacks. The signature includes all standard parameters (objective, next_message, prepended_conversation, memory_labels). Attacks that don’t accept certain parameters will raise ValueError if those parameters are provided.

Parameter	Type	Description
`objective`	`str`	The objective of the attack.
`next_message`	`Optional[Message]`	Message to send to the target.
`prepended_conversation`	`Optional[List[Message]]`	Conversation to prepend.
`memory_labels`	`Optional[Dict[str, str]]`	Memory labels for the attack context.
`**kwargs`	`Any`	Additional context-specific parameters (conversation_id, system_prompt, etc.). Defaults to `{}`.

Returns:

AttackStrategyResultT — The result of the attack execution.

Raises:

ValueError — If required parameters are missing or if unsupported parameters are provided.

`get_attack_scoring_config`¶

get_attack_scoring_config() → Optional[AttackScoringConfig]

Get the attack scoring configuration used by this strategy.

Returns:

Optional[AttackScoringConfig] — Optional[AttackScoringConfig]: The scoring configuration, or None if not applicable.

`get_objective_target`¶

get_objective_target() → PromptTarget

Get the objective target for this attack strategy.

Returns:

PromptTarget — The target system being attacked.

`get_request_converters`¶

get_request_converters() → list[Any]

Get request converter configurations used by this strategy.

Returns:

list[Any] — list[Any]: The list of request PromptConverterConfiguration objects.

`ChunkedRequestAttack`¶

Bases: MultiTurnAttackStrategy[ChunkedRequestAttackContext, AttackResult]

Implementation of chunked extraction attack strategy.

This class orchestrates a multi-turn attack where protected information is requested in chunks/segments across multiple turns. This technique bypasses length-based filters or output truncation by asking for specific character ranges of a secret value.

This technique was discovered during CTF red teaming exercises where targets would reveal portions of secrets when asked for specific segments, even when they refused to reveal the complete value.

The attack flow consists of:

Generating chunk request prompts based on the configured strategy.
Sending each chunk request to the target system sequentially.
Collecting responses from each chunk request.
Combining all chunk responses to reconstruct the full value.
Evaluating the combined result with scorers if configured.
Returning the attack result with achievement status.

The strategy supports customization through converters and scorers for comprehensive evaluation.

Constructor Parameters:

Parameter	Type	Description
`objective_target`	`PromptTarget`	The target system to attack. Defaults to `REQUIRED_VALUE`.
`chunk_size`	`int`	Size of each chunk to request (default: 50). Defaults to `50`.
`total_length`	`int`	Estimated total length of the target value (default: 200). Defaults to `200`.
`chunk_type`	`str`	Type of chunk to request (e.g., “characters”, “bytes”, “words”). Defaults to `'characters'`.
`request_template`	`str`	Template for generating chunk requests (default: “Give me {chunk_type} {start}-{end} of ‘{objective}’”). Defaults to `DEFAULT_TEMPLATE`.
`attack_converter_config`	`Optional[AttackConverterConfig]`	Configuration for prompt converters. Defaults to `None`.
`attack_scoring_config`	`Optional[AttackScoringConfig]`	Configuration for scoring components. Defaults to `None`.
`prompt_normalizer`	`Optional[PromptNormalizer]`	Normalizer for handling prompts. Defaults to `None`.

Methods:

`get_attack_scoring_config`¶

get_attack_scoring_config() → Optional[AttackScoringConfig]

Get the attack scoring configuration used by this strategy.

Returns:

Optional[AttackScoringConfig] — Optional[AttackScoringConfig]: The scoring configuration with objective and auxiliary scorers.

`ChunkedRequestAttackContext`¶

Bases: MultiTurnAttackContext[Any]

Context for the ChunkedRequest attack strategy.

`ConsoleAttackResultPrinter`¶

Bases: AttackResultPrinter

Console printer for attack results with enhanced formatting.

This printer formats attack results for console display with optional color coding, proper indentation, text wrapping, and visual separators. Colors can be disabled for consoles that don’t support ANSI characters.

Constructor Parameters:

Parameter	Type	Description
`width`	`int`	Maximum width for text wrapping. Must be positive. Defaults to 100. Defaults to `100`.
`indent_size`	`int`	Number of spaces for indentation. Must be non-negative. Defaults to 2. Defaults to `2`.
`enable_colors`	`bool`	Whether to enable ANSI color output. When False, all output will be plain text without colors. Defaults to True. Defaults to `True`.

Methods:

`print_conversation_async`¶

print_conversation_async(result: AttackResult, include_scores: bool = False, include_reasoning_trace: bool = False) → None

Print the conversation history to console with enhanced formatting.

Displays the full conversation between user and assistant, including:

Turn numbers
Role indicators (USER/ASSISTANT)
Original and converted values when different
Images if present
Scores for each response

Parameter	Type	Description
`result`	`AttackResult`	The attack result containing the conversation_id. Must have a valid conversation_id attribute.
`include_scores`	`bool`	Whether to include scores in the output. Defaults to False. Defaults to `False`.
`include_reasoning_trace`	`bool`	Whether to include model reasoning trace in the output for applicable models. Defaults to False. Defaults to `False`.

`print_messages_async`¶

print_messages_async(messages: list[Any], include_scores: bool = False, include_reasoning_trace: bool = False) → None

Print a list of messages to console with enhanced formatting.

This method can be called directly with a list of Message objects, without needing an AttackResult. Useful for printing prepended_conversation or any other list of messages.

Displays:

Turn numbers
Role indicators (USER/ASSISTANT/SYSTEM)
Original and converted values when different
Images if present
Scores for each response (if include_scores=True)

Parameter	Type	Description
`messages`	`list`	List of Message objects to print.
`include_scores`	`bool`	Whether to include scores in the output. Defaults to False. Defaults to `False`.
`include_reasoning_trace`	`bool`	Whether to include model reasoning trace in the output for applicable models. Defaults to False. Defaults to `False`.

`print_result_async`¶

print_result_async(result: AttackResult, include_auxiliary_scores: bool = False, include_pruned_conversations: bool = False, include_adversarial_conversation: bool = False) → None

Print the complete attack result to console.

This method orchestrates the printing of all components of an attack result, including header, summary, conversation history, metadata, and footer.

Parameter	Type	Description
`result`	`AttackResult`	The attack result to print. Must not be None.
`include_auxiliary_scores`	`bool`	Whether to include auxiliary scores in the output. Defaults to False. Defaults to `False`.
`include_pruned_conversations`	`bool`	Whether to include pruned conversations. For each pruned conversation, only the last message and its score are shown. Defaults to False. Defaults to `False`.
`include_adversarial_conversation`	`bool`	Whether to include the adversarial conversation (the red teaming LLM’s reasoning). Only shown for successful attacks to avoid overwhelming output. Defaults to False. Defaults to `False`.

`print_summary_async`¶

print_summary_async(result: AttackResult) → None

Print a summary of the attack result with enhanced formatting.

Displays:

Basic information (objective, attack type, conversation ID)
Execution metrics (turns executed, execution time)
Outcome information (status, reason)
Final score if available

Parameter	Type	Description
`result`	`AttackResult`	The attack result to summarize. Must contain objective, attack_identifier, conversation_id, executed_turns, execution_time_ms, outcome, and optionally outcome_reason and last_score attributes.

`ContextComplianceAttack`¶

Bases: PromptSendingAttack

Implementation of the context compliance attack strategy.

This attack attempts to bypass safety measures by rephrasing the objective into a more benign context. It uses an adversarial chat target to:

Rephrase the objective as a more benign question
Generate a response to the benign question
Rephrase the original objective as a follow-up question

This creates a context that makes it harder for the target to detect the true intent.

Constructor Parameters:

Parameter	Type	Description
`objective_target`	`PromptChatTarget`	The target system to attack. Must be a PromptChatTarget. Defaults to `REQUIRED_VALUE`.
`attack_adversarial_config`	`AttackAdversarialConfig`	Configuration for the adversarial component, including the adversarial chat target used for rephrasing.
`attack_converter_config`	`Optional[AttackConverterConfig]`	Configuration for attack converters, including request and response converters. Defaults to `None`.
`attack_scoring_config`	`Optional[AttackScoringConfig]`	Configuration for attack scoring. Defaults to `None`.
`prompt_normalizer`	`Optional[PromptNormalizer]`	The prompt normalizer to use for sending prompts. Defaults to `None`.
`max_attempts_on_failure`	`int`	Maximum number of attempts to retry on failure. Defaults to `0`.
`context_description_instructions_path`	`Optional[Path]`	Path to the context description instructions YAML file. If not provided, uses the default path. Defaults to `None`.
`affirmative_response`	`Optional[str]`	The affirmative response to be used in the conversation history. If not provided, uses the default “yes.”. Defaults to `None`.

`ConversationManager`¶

Manages conversations for attacks, handling message history, system prompts, and conversation state.

This class provides methods to:

Initialize attack context with prepended conversations
Retrieve conversation history
Set system prompts for chat targets

Constructor Parameters:

Parameter	Type	Description
`attack_identifier`	`ComponentIdentifier`	The identifier of the attack this manager belongs to.
`prompt_normalizer`	`Optional[PromptNormalizer]`	Optional prompt normalizer for converting prompts. If not provided, a default PromptNormalizer instance will be created. Defaults to `None`.

Methods:

`add_prepended_conversation_to_memory_async`¶

add_prepended_conversation_to_memory_async(prepended_conversation: list[Message], conversation_id: str, request_converters: Optional[list[PromptConverterConfiguration]] = None, prepended_conversation_config: Optional[PrependedConversationConfig] = None, max_turns: Optional[int] = None) → int

Add prepended conversation messages to memory for a chat target.

This is a lower-level method that handles adding messages to memory without modifying any attack context state. It can be called directly by attacks that manage their own state (like TAP nodes) or internally by initialize_context_async for standard attacks.

Messages are added with:

Duplicated message objects (preserves originals)
simulated_assistant role for assistant messages (for traceability)
Converters applied based on config

Parameter	Type	Description
`prepended_conversation`	`list[Message]`	Messages to add to memory.
`conversation_id`	`str`	Conversation ID to assign to all messages.
`request_converters`	`Optional[list[PromptConverterConfiguration]]`	Optional converters to apply to messages. Defaults to `None`.
`prepended_conversation_config`	`Optional[PrependedConversationConfig]`	Optional configuration for converter roles. Defaults to `None`.
`max_turns`	`Optional[int]`	If provided, validates that turn count doesn’t exceed this limit. Defaults to `None`.

Returns:

int — The number of turns (assistant messages) added.

Raises:

ValueError — If max_turns is exceeded by the prepended conversation.

`get_conversation`¶

get_conversation(conversation_id: str) → list[Message]

Retrieve a conversation by its ID.

Parameter	Type	Description
`conversation_id`	`str`	The ID of the conversation to retrieve.

Returns:

list[Message] — A list of messages in the conversation, ordered by creation time.
list[Message] — Returns empty list if no messages exist.

`get_last_message`¶

get_last_message(conversation_id: str, role: Optional[ChatMessageRole] = None) → Optional[MessagePiece]

Retrieve the most recent message from a conversation.

Parameter	Type	Description
`conversation_id`	`str`	The ID of the conversation to retrieve from.
`role`	`Optional[ChatMessageRole]`	If provided, return only the last message matching this role. Defaults to `None`.

Returns:

Optional[MessagePiece] — The last message piece, or None if no messages exist.

`initialize_context_async`¶

initialize_context_async(context: AttackContext[Any], target: PromptTarget, conversation_id: str, request_converters: Optional[list[PromptConverterConfiguration]] = None, prepended_conversation_config: Optional[PrependedConversationConfig] = None, max_turns: Optional[int] = None, memory_labels: Optional[dict[str, str]] = None) → ConversationState

Initialize attack context with prepended conversation and merged labels.

This is the primary method for setting up an attack context. It:

Merges memory_labels from attack strategy with context labels
Processes prepended_conversation based on target type and config
Updates context.executed_turns for multi-turn attacks
Sets context.next_message if there’s an unanswered user message

Parameter	Type	Description
`context`	`AttackContext[Any]`	The attack context to initialize.
`target`	`PromptTarget`	The objective target for the conversation.
`conversation_id`	`str`	Unique identifier for the conversation.
`request_converters`	`Optional[list[PromptConverterConfiguration]]`	Converters to apply to messages. Defaults to `None`.
`prepended_conversation_config`	`Optional[PrependedConversationConfig]`	Configuration for handling prepended conversation. Defaults to `None`.
`max_turns`	`Optional[int]`	Maximum turns allowed (for validation and state tracking). Defaults to `None`.
`memory_labels`	`Optional[dict[str, str]]`	Labels from the attack strategy to merge with context labels. Defaults to `None`.

Returns:

ConversationState — ConversationState with turn_count and last_assistant_message_scores.

Raises:

ValueError — If conversation_id is empty, or if prepended_conversation requires a PromptChatTarget but target is not one.

`set_system_prompt`¶

set_system_prompt(target: PromptChatTarget, conversation_id: str, system_prompt: str, labels: Optional[dict[str, str]] = None) → None

Set or update the system prompt for a conversation.

Parameter	Type	Description
`target`	`PromptChatTarget`	The chat target to set the system prompt on.
`conversation_id`	`str`	Unique identifier for the conversation.
`system_prompt`	`str`	The system prompt text.
`labels`	`Optional[dict[str, str]]`	Optional labels to associate with the system prompt. Defaults to `None`.

`ConversationSession`¶

Session for conversations.

`ConversationState`¶

Container for conversation state data returned from context initialization.

`CrescendoAttack`¶

Bases: MultiTurnAttackStrategy[CrescendoAttackContext, CrescendoAttackResult]

Implementation of the Crescendo attack strategy.

The Crescendo Attack is a multi-turn strategy that progressively guides the model to generate harmful content through small, benign steps. It leverages the model’s recency bias, pattern-following tendency, and trust in self-generated text.

The attack flow consists of:

Generating progressively harmful prompts using an adversarial chat model.
Sending prompts to the target and evaluating responses for refusal.
Backtracking when the target refuses to respond.
Scoring responses to determine if the objective has been achieved.
Continuing until the objective is met or maximum turns/backtracks are reached.

You can learn more about the Crescendo attack Russinovich et al., 2024.

Constructor Parameters:

Parameter	Type	Description
`objective_target`	`PromptChatTarget`	The target system to attack. Must be a PromptChatTarget. Defaults to `REQUIRED_VALUE`.
`attack_adversarial_config`	`AttackAdversarialConfig`	Configuration for the adversarial component, including the adversarial chat target and optional system prompt path.
`attack_converter_config`	`Optional[AttackConverterConfig]`	Configuration for attack converters, including request and response converters. Defaults to `None`.
`attack_scoring_config`	`Optional[AttackScoringConfig]`	Configuration for scoring responses. Defaults to `None`.
`prompt_normalizer`	`Optional[PromptNormalizer]`	Normalizer for prompts. Defaults to `None`.
`max_backtracks`	`int`	Maximum number of backtracks allowed. Defaults to `10`.
`max_turns`	`int`	Maximum number of turns allowed. Defaults to `10`.
`prepended_conversation_config`	`Optional[PrependedConversationConfiguration]`	Configuration for how to process prepended conversations. Controls converter application by role, message normalization, and non-chat target behavior. Defaults to `None`.

Methods:

`get_attack_scoring_config`¶

get_attack_scoring_config() → Optional[AttackScoringConfig]

Get the attack scoring configuration used by this strategy.

Returns:

Optional[AttackScoringConfig] — Optional[AttackScoringConfig]: The scoring configuration with objective scorer, auxiliary scorers, and refusal scorer.

`CrescendoAttackContext`¶

Bases: MultiTurnAttackContext[Any]

Context for the Crescendo attack strategy.

`CrescendoAttackResult`¶

Bases: AttackResult

Result of the Crescendo attack strategy execution.

`FlipAttack`¶

Bases: PromptSendingAttack

Implement the FlipAttack method Li et al., 2024.

Essentially, it adds a system prompt to the beginning of the conversation to flip each word in the prompt.

Constructor Parameters:

Parameter	Type	Description
`objective_target`	`PromptChatTarget`	The target system to attack. Defaults to `REQUIRED_VALUE`.
`attack_converter_config`	`(AttackConverterConfig, Optional)`	Configuration for the prompt converters. Defaults to `None`.
`attack_scoring_config`	`(AttackScoringConfig, Optional)`	Configuration for scoring components. Defaults to `None`.
`prompt_normalizer`	`(PromptNormalizer, Optional)`	Normalizer for handling prompts. Defaults to `None`.
`max_attempts_on_failure`	`(int, Optional)`	Maximum number of attempts to retry on failure. Defaults to `0`.

`ManyShotJailbreakAttack`¶

Bases: PromptSendingAttack

Implement the Many Shot Jailbreak method Anthropic, 2024.

Prepends the seed prompt with a faux dialogue between a human and an AI, using examples from a dataset to demonstrate successful jailbreaking attempts. This method leverages the model’s ability to learn from examples to bypass safety measures.

Constructor Parameters:

Parameter	Type	Description
`objective_target`	`PromptTarget`	The target system to attack. Defaults to `REQUIRED_VALUE`.
`attack_converter_config`	`(AttackConverterConfig, Optional)`	Configuration for the prompt converters. Defaults to `None`.
`attack_scoring_config`	`(AttackScoringConfig, Optional)`	Configuration for scoring components. Defaults to `None`.
`prompt_normalizer`	`(PromptNormalizer, Optional)`	Normalizer for handling prompts. Defaults to `None`.
`max_attempts_on_failure`	`(int, Optional)`	Maximum number of attempts to retry on failure. Defaults to 0. Defaults to `0`.
`example_count`	`int`	The number of examples to include from many_shot_examples or the Many Shot Jailbreaking dataset. Defaults to the first 100. Defaults to `100`.
`many_shot_examples`	`(list[dict[str, str]], Optional)`	The many shot jailbreaking examples to use. If not provided, takes the first `example_count` examples from Many Shot Jailbreaking dataset. Defaults to `None`.

`MarkdownAttackResultPrinter`¶

Bases: AttackResultPrinter

Markdown printer for attack results optimized for Jupyter notebooks.

This printer formats attack results as markdown, making them ideal for display in Jupyter notebooks where LLM responses often contain code blocks and other markdown formatting that should be properly rendered.

Constructor Parameters:

Parameter	Type	Description
`display_inline`	`bool`	If True, uses IPython.display to render markdown inline in Jupyter notebooks. If False, prints markdown strings. Defaults to True. Defaults to `True`.

Methods:

`print_conversation_async`¶

print_conversation_async(result: AttackResult, include_scores: bool = False) → None

Print only the conversation history as formatted markdown.

Extracts and displays the conversation messages from the attack result without the summary or metadata sections. Useful for focusing on the actual interaction flow.

Parameter	Type	Description
`result`	`AttackResult`	The attack result containing the conversation to display.
`include_scores`	`bool`	Whether to include scores for each message. Defaults to False. Defaults to `False`.

`print_result_async`¶

print_result_async(result: AttackResult, include_auxiliary_scores: bool = False, include_pruned_conversations: bool = False, include_adversarial_conversation: bool = False) → None

Print the complete attack result as formatted markdown.

Generates a comprehensive markdown report including attack summary, conversation history, scores, and metadata. The output is optimized for display in Jupyter notebooks.

Parameter	Type	Description
`result`	`AttackResult`	The attack result to print.
`include_auxiliary_scores`	`bool`	Whether to include auxiliary scores in the conversation display. Defaults to False. Defaults to `False`.
`include_pruned_conversations`	`bool`	Whether to include pruned conversations. For each pruned conversation, only the last message and its score are shown. Defaults to False. Defaults to `False`.
`include_adversarial_conversation`	`bool`	Whether to include the adversarial conversation (the red teaming LLM’s reasoning). Only shown for successful attacks to avoid overwhelming output. Defaults to False. Defaults to `False`.

`print_summary_async`¶

print_summary_async(result: AttackResult) → None

Print a summary of the attack result as formatted markdown.

Displays key information about the attack including objective, outcome, execution metrics, and final score without the full conversation history. Useful for getting a quick overview of the attack results.

Parameter	Type	Description
`result`	`AttackResult`	The attack result to summarize.

`MultiPromptSendingAttack`¶

Bases: MultiTurnAttackStrategy[MultiTurnAttackContext[Any], AttackResult]

Implementation of multi-prompt sending attack strategy.

This class orchestrates a multi-turn attack where a series of predefined malicious prompts are sent sequentially to try to achieve a specific objective against a target system. The strategy evaluates the final target response using optional scorers to determine if the objective has been met.

The attack flow consists of:

Sending each predefined prompt to the target system in sequence.
Continuing until all predefined prompts are sent.
Evaluating the final response with scorers if configured.
Returning the attack result with achievement status.

Note: This attack always runs all predefined prompts regardless of whether the objective is achieved early in the sequence.

The strategy supports customization through prepended conversations, converters, and multiple scorer types for comprehensive evaluation.

Constructor Parameters:

Parameter	Type	Description
`objective_target`	`PromptTarget`	The target system to attack. Defaults to `REQUIRED_VALUE`.
`attack_converter_config`	`Optional[AttackConverterConfig]`	Configuration for prompt converters. Defaults to `None`.
`attack_scoring_config`	`Optional[AttackScoringConfig]`	Configuration for scoring components. Defaults to `None`.
`prompt_normalizer`	`Optional[PromptNormalizer]`	Normalizer for handling prompts. Defaults to `None`.

Methods:

`execute_async`¶

execute_async(kwargs: Any = {}) → AttackResult

Execute the attack strategy asynchronously with the provided parameters.

Returns:

AttackResult — The result of the attack execution.

`get_attack_scoring_config`¶

get_attack_scoring_config() → Optional[AttackScoringConfig]

Get the attack scoring configuration used by this strategy.

Returns:

Optional[AttackScoringConfig] — Optional[AttackScoringConfig]: The scoring configuration with objective and auxiliary scorers.

`MultiPromptSendingAttackParameters`¶

Bases: AttackParameters

Parameters for MultiPromptSendingAttack.

Extends AttackParameters to include user_messages field for multi-turn attacks. Only accepts objective and user_messages fields.

Methods:

`from_seed_group_async`¶

from_seed_group_async(seed_group: SeedAttackGroup, adversarial_chat: Optional[PromptChatTarget] = None, objective_scorer: Optional[TrueFalseScorer] = None, overrides: Any = {}) → MultiPromptSendingAttackParameters

Create parameters from a SeedGroup, extracting user messages.

Parameter	Type	Description
`seed_group`	`SeedAttackGroup`	The seed group to extract parameters from.
`adversarial_chat`	`Optional[PromptChatTarget]`	Not used by this attack type. Defaults to `None`.
`objective_scorer`	`Optional[TrueFalseScorer]`	Not used by this attack type. Defaults to `None`.
`**overrides`	`Any`	Field overrides to apply. Defaults to `{}`.

Returns:

MultiPromptSendingAttackParameters — MultiPromptSendingAttackParameters instance.

Raises:

ValueError — If seed_group has no objective, no user messages, or if overrides contain invalid fields.

`MultiTurnAttackContext`¶

Bases: AttackContext[AttackParamsT]

Context for multi-turn attacks.

Holds execution state for multi-turn attacks. The immutable attack parameters (objective, next_message, prepended_conversation, memory_labels) are stored in the params field inherited from AttackContext.

`MultiTurnAttackStrategy`¶

Bases: AttackStrategy[MultiTurnAttackStrategyContextT, AttackStrategyResultT], ABC

Strategy for executing multi-turn attacks. This strategy is designed to handle attacks that consist of multiple turns of interaction with the target model.

Constructor Parameters:

Parameter	Type	Description
`objective_target`	`PromptTarget`	The target system to attack.
`context_type`	`type[MultiTurnAttackContext]`	The type of context this strategy will use.
`params_type`	`Type[AttackParamsT]`	The type of parameters this strategy accepts. Defaults to `AttackParameters`.
`logger`	`logging.Logger`	Logger instance for logging events and messages. Defaults to `logger`.

`PrependedConversationConfig`¶

Configuration for controlling how prepended conversations are processed before being sent to the objective target.

This class provides control over:

Which message roles should have request converters applied
How to normalize conversation history for non-chat objective targets
What to do when the objective target is not a PromptChatTarget

Methods:

`default`¶

default() → PrependedConversationConfig

Create a default configuration with converters applied to all roles.

Returns:

PrependedConversationConfig — A configuration that applies converters to all prepended messages,
PrependedConversationConfig — raising an error for non-chat targets.

`for_non_chat_target`¶

for_non_chat_target(message_normalizer: Optional[MessageStringNormalizer] = None, apply_converters_to_roles: Optional[list[ChatMessageRole]] = None) → PrependedConversationConfig

Create a configuration for use with non-chat targets.

This configuration normalizes the prepended conversation into a text block that will be prepended to the first message sent to the target.

Parameter	Type	Description
`message_normalizer`	`Optional[MessageStringNormalizer]`	Normalizer for formatting the prepended conversation into a string. Defaults to ConversationContextNormalizer if not provided. Defaults to `None`.
`apply_converters_to_roles`	`Optional[list[ChatMessageRole]]`	Roles to apply converters to before normalization. Defaults to all roles. Defaults to `None`.

Returns:

PrependedConversationConfig — A configuration that normalizes the prepended conversation for non-chat targets.

`get_message_normalizer`¶

get_message_normalizer() → MessageStringNormalizer

Get the normalizer for objective target context, with a default fallback.

Returns:

MessageStringNormalizer — The configured objective_target_context_normalizer, or a default
MessageStringNormalizer — ConversationContextNormalizer if none was configured.

`PromptSendingAttack`¶

Bases: SingleTurnAttackStrategy

Implementation of single-turn prompt sending attack strategy.

This class orchestrates a single-turn attack where malicious prompts are injected to try to achieve a specific objective against a target system. The strategy evaluates the target response using optional scorers to determine if the objective has been met.

The attack flow consists of:

Preparing the prompt based on the objective.
Sending the prompt to the target system through optional converters.
Evaluating the response with scorers if configured.
Retrying on failure up to the configured number of retries.
Returning the attack result with achievement status.

The strategy supports customization through prepended conversations, converters, and multiple scorer types for comprehensive evaluation.

Constructor Parameters:

Parameter	Type	Description
`objective_target`	`PromptTarget`	The target system to attack. Defaults to `REQUIRED_VALUE`.
`attack_converter_config`	`Optional[AttackConverterConfig]`	Configuration for prompt converters. Defaults to `None`.
`attack_scoring_config`	`Optional[AttackScoringConfig]`	Configuration for scoring components. Defaults to `None`.
`prompt_normalizer`	`Optional[PromptNormalizer]`	Normalizer for handling prompts. Defaults to `None`.
`max_attempts_on_failure`	`int`	Maximum number of attempts to retry on failure. Defaults to `0`.
`params_type`	`Type[AttackParamsT]`	The type of parameters this strategy accepts. Defaults to AttackParameters. Use AttackParameters.excluding() to create a params type that rejects certain fields. Defaults to `AttackParameters`.
`prepended_conversation_config`	`Optional[PrependedConversationConfiguration]`	Configuration for how to process prepended conversations. Controls converter application by role, message normalization, and non-chat target behavior. Defaults to `None`.

Methods:

`get_attack_scoring_config`¶

get_attack_scoring_config() → Optional[AttackScoringConfig]

Get the attack scoring configuration used by this strategy.

Returns:

Optional[AttackScoringConfig] — Optional[AttackScoringConfig]: The scoring configuration with objective and auxiliary scorers.

`RTASystemPromptPaths`¶

Bases: enum.Enum

Enum for predefined red teaming attack system prompt paths.

`RedTeamingAttack`¶

Bases: MultiTurnAttackStrategy[MultiTurnAttackContext[Any], AttackResult]

Implementation of multi-turn red teaming attack strategy.

This class orchestrates an iterative attack process where an adversarial chat model generates prompts to send to a target system, attempting to achieve a specified objective. The strategy evaluates each target response using a scorer to determine if the objective has been met.

The attack flow consists of:

Generating adversarial prompts based on previous responses and scoring feedback.
Sending prompts to the target system through optional converters.
Scoring target responses to assess objective achievement.
Using scoring feedback to guide subsequent prompt generation.
Continuing until the objective is achieved or maximum turns are reached.

The strategy supports customization through system prompts, seed prompts, and prompt converters, allowing for various attack techniques and scenarios.

Constructor Parameters:

Parameter	Type	Description
`objective_target`	`PromptTarget`	The target system to attack. Defaults to `REQUIRED_VALUE`.
`attack_adversarial_config`	`AttackAdversarialConfig`	Configuration for the adversarial component.
`attack_converter_config`	`Optional[AttackConverterConfig]`	Configuration for attack converters. Defaults to None. Defaults to `None`.
`attack_scoring_config`	`Optional[AttackScoringConfig]`	Configuration for attack scoring. Defaults to None. Defaults to `None`.
`prompt_normalizer`	`Optional[PromptNormalizer]`	The prompt normalizer to use for sending prompts. Defaults to None. Defaults to `None`.
`max_turns`	`int`	Maximum number of turns for the attack. Defaults to 10. Defaults to `10`.
`score_last_turn_only`	`bool`	If True, only score the final turn instead of every turn. This reduces LLM calls when intermediate scores are not needed (e.g., for generating simulated conversations). The attack will run for exactly max_turns when this is enabled. Defaults to False. Defaults to `False`.

Methods:

`get_attack_scoring_config`¶

get_attack_scoring_config() → Optional[AttackScoringConfig]

Get the attack scoring configuration used by this strategy.

Returns:

Optional[AttackScoringConfig] — Optional[AttackScoringConfig]: The scoring configuration with objective scorer and use_score_as_feedback.

`RolePlayAttack`¶

Bases: PromptSendingAttack

Implementation of single-turn role-play attack strategy.

This class orchestrates a role-play attack where malicious objectives are rephrased into role-playing contexts to make them appear more benign and bypass content filters. The strategy uses an adversarial chat target to transform the objective into a role-play scenario before sending it to the target system.

The attack flow consists of:

Loading role-play scenarios from a YAML file.
Using an adversarial chat target to rephrase the objective into the role-play context.
Sending the rephrased objective to the target system.
Evaluating the response with scorers if configured.
Retrying on failure up to the configured number of retries.
Returning the attack result

The strategy supports customization through prepended conversations, converters, and multiple scorer types.

Constructor Parameters:

Parameter	Type	Description
`objective_target`	`PromptTarget`	The target system to attack. Defaults to `REQUIRED_VALUE`.
`adversarial_chat`	`PromptChatTarget`	The adversarial chat target used to rephrase objectives into role-play scenarios.
`role_play_definition_path`	`pathlib.Path`	Path to the YAML file containing role-play definitions (rephrase instructions, user start turn, assistant start turn).
`attack_converter_config`	`Optional[AttackConverterConfig]`	Configuration for prompt converters. Defaults to `None`.
`attack_scoring_config`	`Optional[AttackScoringConfig]`	Configuration for scoring components. Defaults to `None`.
`prompt_normalizer`	`Optional[PromptNormalizer]`	Normalizer for handling prompts. Defaults to `None`.
`max_attempts_on_failure`	`int`	Maximum number of attempts to retry the attack Defaults to `0`.

`RolePlayPaths`¶

Bases: enum.Enum

Enum for predefined role-play scenario paths.

`SingleTurnAttackContext`¶

Bases: AttackContext[AttackParamsT]

Context for single-turn attacks.

Holds execution state for single-turn attacks. The immutable attack parameters (objective, next_message, prepended_conversation, memory_labels) are stored in the params field inherited from AttackContext.

`SingleTurnAttackStrategy`¶

Bases: AttackStrategy[SingleTurnAttackContext[Any], AttackResult], ABC

Strategy for executing single-turn attacks. This strategy is designed to handle attacks that consist of a single turn of interaction with the target model.

Constructor Parameters:

Parameter	Type	Description
`objective_target`	`PromptTarget`	The target system to attack.
`context_type`	`type[SingleTurnAttackContext]`	The type of context this strategy will use. Defaults to `SingleTurnAttackContext`.
`params_type`	`Type[AttackParamsT]`	The type of parameters this strategy accepts. Defaults to `AttackParameters`.
`logger`	`logging.Logger`	Logger instance for logging events and messages. Defaults to `logger`.

`SkeletonKeyAttack`¶

Bases: PromptSendingAttack

Implementation of the skeleton key jailbreak attack strategy.

This attack sends an initial skeleton key prompt to the target, and then follows up with a separate attack prompt. If successful, the first prompt makes the target comply even with malicious follow-up prompts.

The attack flow consists of:

Sending a skeleton key prompt to bypass the target’s safety mechanisms.
Sending the actual objective prompt to the primed target.
Evaluating the response using configured scorers to determine success.

Learn more about the attack Microsoft Security Response Center, 2024.

Constructor Parameters:

Parameter	Type	Description
`objective_target`	`PromptTarget`	The target system to attack. Defaults to `REQUIRED_VALUE`.
`attack_converter_config`	`Optional[AttackConverterConfig]`	Configuration for prompt converters. Defaults to `None`.
`attack_scoring_config`	`Optional[AttackScoringConfig]`	Configuration for scoring components. Defaults to `None`.
`prompt_normalizer`	`Optional[PromptNormalizer]`	Normalizer for handling prompts. Defaults to `None`.
`skeleton_key_prompt`	`Optional[str]`	The skeleton key prompt to use. If not provided, uses the default skeleton key prompt. Defaults to `None`.
`max_attempts_on_failure`	`int`	Maximum number of attempts to retry on failure. Defaults to `0`.

`TAPAttackContext`¶

Bases: MultiTurnAttackContext[Any]

Context for the Tree of Attacks with Pruning (TAP) attack strategy.

This context contains all execution-specific state for a TAP attack instance, ensuring thread safety by isolating state per execution.

`TAPAttackResult`¶

Bases: AttackResult

Result of the Tree of Attacks with Pruning (TAP) attack strategy execution.

This result includes the standard attack result information with attack-specific data stored in the metadata dictionary.

`TreeOfAttacksWithPruningAttack`¶

Bases: AttackStrategy[TAPAttackContext, TAPAttackResult]

Implement the Tree of Attacks with Pruning (TAP) attack strategy.

The TAP attack strategy systematically explores multiple adversarial prompt paths in parallel using a tree structure. It employs breadth-first search with pruning to efficiently find effective jailbreaks while managing computational resources.

How it works:

Initialization: Creates multiple initial attack branches (width) to explore different approaches
Tree Expansion: For each iteration (depth), branches are expanded by a branching factor
Prompt Generation: Each node generates adversarial prompts via an LLM red-teaming assistant
Evaluation: Responses are evaluated for objective achievement and on-topic relevance
Pruning: Low-scoring or off-topic branches are pruned to maintain the width constraint
Iteration: The process continues until the objective is achieved or max depth is reached

The strategy balances exploration (trying diverse approaches) with exploitation (focusing on promising paths) through its pruning mechanism.

Constructor Parameters:

Parameter	Type	Description
`objective_target`	`PromptChatTarget`	The target system to attack. Defaults to `REQUIRED_VALUE`.
`attack_adversarial_config`	`AttackAdversarialConfig`	Configuration for the adversarial chat component.
`attack_converter_config`	`Optional[AttackConverterConfig]`	Configuration for attack converters. Defaults to None. Defaults to `None`.
`attack_scoring_config`	`Optional[AttackScoringConfig]`	Scoring configuration for TAP. The objective_scorer must be a FloatScaleThresholdScorer, which provides both granular float scores for node comparison and a threshold for determining success. Can be either AttackScoringConfig or TAPAttackScoringConfig. If not provided, a default configuration with SelfAskScaleScorer and threshold 0.7 is created. Defaults to `None`.
`prompt_normalizer`	`Optional[PromptNormalizer]`	The prompt normalizer to use. Defaults to None. Defaults to `None`.
`tree_width`	`int`	Number of branches to explore in parallel at each level. Defaults to 3. Defaults to `3`.
`tree_depth`	`int`	Maximum number of iterations to perform. Defaults to 5. Defaults to `5`.
`branching_factor`	`int`	Number of child branches to create from each parent. Defaults to 2. Defaults to `2`.
`on_topic_checking_enabled`	`bool`	Whether to check if prompts are on-topic. Defaults to True. Defaults to `True`.
`desired_response_prefix`	`str`	Expected prefix for successful responses. Defaults to “Sure, here is”. Defaults to `'Sure, here is'`.
`batch_size`	`int`	Number of nodes to process in parallel per batch. Defaults to 10. Defaults to `10`.
`prepended_conversation_config`	`Optional[PrependedConversationConfiguration]`	Configuration for how to process prepended conversations. Controls converter application by role, message normalization, and non-chat target behavior. Defaults to `None`.

Methods:

`execute_async`¶

execute_async(kwargs: Any = {}) → TAPAttackResult

Execute the multi-turn attack strategy asynchronously with the provided parameters.

Parameter	Type	Description
`objective`	`str`	The objective of the attack.
`memory_labels`	`Optional[Dict[str, str]]`	Memory labels for the attack context.
`**kwargs`	`Any`	Additional parameters for the attack. Defaults to `{}`.

Returns:

TAPAttackResult — The result of the attack execution.

`get_attack_scoring_config`¶

get_attack_scoring_config() → Optional[AttackScoringConfig]

Get the attack scoring configuration used by this strategy.

Returns:

Optional[AttackScoringConfig] — The TAP-specific scoring configuration.

References¶

Russinovich, M., Salem, A., & Eldan, R. (2024). Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack. arXiv Preprint arXiv:2404.01833. https://crescendo-the-multiturn-jailbreak.github.io/
Li, Y., Zhang, H., Li, H., & Zheng, H.-T. (2024). FlipAttack: Jailbreak LLMs via Flipping. arXiv Preprint arXiv:2410.02832. https://arxiv.org/abs/2410.02832
Anthropic. (2024). Many-Shot Jailbreaking. https://www.anthropic.com/research/many-shot-jailbreaking
Microsoft Security Response Center. (2024). Mitigating Skeleton Key, a New Type of Generative AI Jailbreak Technique. https://www.microsoft.com/en-us/security/blog/2024/06/26/mitigating-skeleton-key-a-new-type-of-generative-ai-jailbreak-technique/

Functions¶

generate_simulated_conversation_async¶

AttackAdversarialConfig¶

AttackContext¶

AttackConverterConfig¶

AttackExecutor¶

execute_attack_async¶

execute_attack_from_seed_groups_async¶

execute_multi_objective_attack_async¶

execute_multi_turn_attacks_async¶

execute_single_turn_attacks_async¶

AttackExecutorResult¶

get_results¶

raise_if_incomplete¶

AttackParameters¶

excluding¶

from_seed_group_async¶

AttackResultPrinter¶

print_conversation_async¶

print_result_async¶

print_summary_async¶

AttackScoringConfig¶

AttackStrategy¶

execute_async¶

get_attack_scoring_config¶

get_objective_target¶

get_request_converters¶

ChunkedRequestAttack¶

get_attack_scoring_config¶

ChunkedRequestAttackContext¶

ConsoleAttackResultPrinter¶

print_conversation_async¶

print_messages_async¶

print_result_async¶

print_summary_async¶

ContextComplianceAttack¶

ConversationManager¶

add_prepended_conversation_to_memory_async¶

get_conversation¶

get_last_message¶

initialize_context_async¶

set_system_prompt¶

ConversationSession¶

ConversationState¶

CrescendoAttack¶

get_attack_scoring_config¶

CrescendoAttackContext¶

CrescendoAttackResult¶

FlipAttack¶

ManyShotJailbreakAttack¶

MarkdownAttackResultPrinter¶

print_conversation_async¶

print_result_async¶

print_summary_async¶

MultiPromptSendingAttack¶

execute_async¶

get_attack_scoring_config¶

MultiPromptSendingAttackParameters¶

from_seed_group_async¶

MultiTurnAttackContext¶

MultiTurnAttackStrategy¶

PrependedConversationConfig¶

default¶

for_non_chat_target¶

get_message_normalizer¶

PromptSendingAttack¶

get_attack_scoring_config¶

RTASystemPromptPaths¶

RedTeamingAttack¶

get_attack_scoring_config¶

RolePlayAttack¶

RolePlayPaths¶

SingleTurnAttackContext¶

SingleTurnAttackStrategy¶

SkeletonKeyAttack¶

TAPAttackContext¶

TAPAttackResult¶

TreeOfAttacksWithPruningAttack¶

execute_async¶

`generate_simulated_conversation_async`¶

`AttackAdversarialConfig`¶

`AttackContext`¶

`AttackConverterConfig`¶

`AttackExecutor`¶

`execute_attack_async`¶

`execute_attack_from_seed_groups_async`¶

`execute_multi_objective_attack_async`¶

`execute_multi_turn_attacks_async`¶

`execute_single_turn_attacks_async`¶

`AttackExecutorResult`¶

`get_results`¶

`raise_if_incomplete`¶

`AttackParameters`¶

`excluding`¶

`from_seed_group_async`¶

`AttackResultPrinter`¶

`print_conversation_async`¶

`print_result_async`¶

`print_summary_async`¶

`AttackScoringConfig`¶

`AttackStrategy`¶

`execute_async`¶

`get_attack_scoring_config`¶

`get_objective_target`¶

`get_request_converters`¶

`ChunkedRequestAttack`¶

`get_attack_scoring_config`¶

`ChunkedRequestAttackContext`¶

`ConsoleAttackResultPrinter`¶

`print_conversation_async`¶

`print_messages_async`¶

`print_result_async`¶

`print_summary_async`¶

`ContextComplianceAttack`¶

`ConversationManager`¶

`add_prepended_conversation_to_memory_async`¶

`get_conversation`¶

`get_last_message`¶

`initialize_context_async`¶

`set_system_prompt`¶

`ConversationSession`¶

`ConversationState`¶

`CrescendoAttack`¶

`get_attack_scoring_config`¶

`CrescendoAttackContext`¶

`CrescendoAttackResult`¶

`FlipAttack`¶

`ManyShotJailbreakAttack`¶

`MarkdownAttackResultPrinter`¶

`print_conversation_async`¶

`print_result_async`¶

`print_summary_async`¶

`MultiPromptSendingAttack`¶

`execute_async`¶

`get_attack_scoring_config`¶

`MultiPromptSendingAttackParameters`¶

`from_seed_group_async`¶

`MultiTurnAttackContext`¶

`MultiTurnAttackStrategy`¶

`PrependedConversationConfig`¶

`default`¶

`for_non_chat_target`¶

`get_message_normalizer`¶

`PromptSendingAttack`¶

`get_attack_scoring_config`¶

`RTASystemPromptPaths`¶

`RedTeamingAttack`¶

`get_attack_scoring_config`¶

`RolePlayAttack`¶

`RolePlayPaths`¶

`SingleTurnAttackContext`¶

`SingleTurnAttackStrategy`¶

`SkeletonKeyAttack`¶

`TAPAttackContext`¶

`TAPAttackResult`¶

`TreeOfAttacksWithPruningAttack`¶

`execute_async`¶

`get_attack_scoring_config`¶