pyrit.executor.attack.generate_simulated_conversation_async

pyrit.executor.attack.generate_simulated_conversation_async#

async generate_simulated_conversation_async(*, objective: str, adversarial_chat: PromptChatTarget, objective_scorer: TrueFalseScorer, num_turns: int = 3, adversarial_chat_system_prompt_path: str | Path, simulated_target_system_prompt_path: str | Path | None = None, attack_converter_config: AttackConverterConfig | None = None, memory_labels: dict[str, str] | None = None) SimulatedConversationResult[source]#

Generate a simulated conversation between an adversarial chat and a compliant target.

This utility runs a RedTeamingAttack with score_last_turn_only=True against a simulated target (the same LLM as adversarial_chat, but configured with a compliant system prompt). The resulting conversation can be used as prepended_conversation for subsequent attacks against real targets.

Use cases: - Creating role-play scenarios dynamically (e.g., movie script, video game) - Establishing conversational context before attacking a real target - Generating multi-turn jailbreak setups without hardcoded responses

Parameters:
  • objective (str) – The objective for the adversarial chat to work toward.

  • adversarial_chat (PromptChatTarget) – The adversarial LLM that generates attack prompts. This same LLM is also used as the simulated target with a compliant system prompt.

  • objective_scorer (TrueFalseScorer) – Scorer to evaluate the final turn.

  • num_turns (int) – Number of conversation turns to generate. Defaults to 3.

  • adversarial_chat_system_prompt_path (Union[str, Path]) – Path to the system prompt for the adversarial chat. This is required.

  • simulated_target_system_prompt_path (Optional[Union[str, Path]]) – Path to the system prompt for the simulated target. If not provided, uses the default compliant prompt. The template should accept objective and num_turns parameters.

  • attack_converter_config (Optional[AttackConverterConfig]) – Converter configuration for the attack. Defaults to None.

  • memory_labels (Optional[dict[str, str]]) – Labels to associate with the conversation in memory. Defaults to None.

Returns:

The result containing the generated conversation and score.

Use prepended_messages to get completed turns before the selected turn, next_message to get the user message at the selected turn for use as an attack’s initial prompt, or access conversation directly for all messages. Set turn_index to select an earlier turn if the final turn wasn’t successful.

Return type:

SimulatedConversationResult

Raises:

ValueError – If num_turns is not a positive integer.