pyrit.executor.attack.RolePlayAttack

pyrit.executor.attack.RolePlayAttack#

class RolePlayAttack(*, objective_target: PromptTarget, adversarial_chat: PromptChatTarget, role_play_definition_path: Path, attack_converter_config: AttackConverterConfig | None = None, attack_scoring_config: AttackScoringConfig | None = None, prompt_normalizer: PromptNormalizer | None = None, max_attempts_on_failure: int = 0)[source]#

Bases: PromptSendingAttack

Implementation of single-turn role-play attack strategy.

This class orchestrates a role-play attack where malicious objectives are rephrased into role-playing contexts to make them appear more benign and bypass content filters. The strategy uses an adversarial chat target to transform the objective into a role-play scenario before sending it to the target system.

The attack flow consists of: 1. Loading role-play scenarios from a YAML file. 2. Using an adversarial chat target to rephrase the objective into the role-play context. 3. Sending the rephrased objective to the target system. 4. Evaluating the response with scorers if configured. 5. Retrying on failure up to the configured number of retries. 6. Returning the attack result

The strategy supports customization through prepended conversations, converters, and multiple scorer types.

__init__(*, objective_target: PromptTarget, adversarial_chat: PromptChatTarget, role_play_definition_path: Path, attack_converter_config: AttackConverterConfig | None = None, attack_scoring_config: AttackScoringConfig | None = None, prompt_normalizer: PromptNormalizer | None = None, max_attempts_on_failure: int = 0) None[source]#

Initializes the role-play attack strategy.

Parameters:
  • objective_target (PromptTarget) – The target system to attack.

  • adversarial_chat (PromptChatTarget) – The adversarial chat target used to rephrase objectives into role-play scenarios.

  • role_play_definition_path (pathlib.Path) – Path to the YAML file containing role-play definitions (rephrase instructions, user start turn, assistant start turn).

  • attack_converter_config (Optional[AttackConverterConfig]) – Configuration for prompt converters.

  • attack_scoring_config (Optional[AttackScoringConfig]) – Configuration for scoring components.

  • prompt_normalizer (Optional[PromptNormalizer]) – Normalizer for handling prompts.

  • max_attempts_on_failure (int) – Maximum number of attempts to retry the attack

Raises:
  • ValueError – If the objective scorer is not a true/false scorer.

  • FileNotFoundError – If the role_play_definition_path does not exist.

Methods

__init__(*, objective_target, ...[, ...])

Initializes the role-play attack strategy.

execute_async(**kwargs)

Execute the attack strategy asynchronously with the provided parameters.

execute_with_context_async(*, context)

Execute strategy with complete lifecycle management.

get_identifier()