pyrit.executor.promptgen.FuzzerGenerator

pyrit.executor.promptgen.FuzzerGenerator#

class FuzzerGenerator(*, objective_target: PromptTarget, template_converters: List[FuzzerConverter], converter_config: StrategyConverterConfig | None = None, scorer: Scorer | None = None, scoring_success_threshold: float = 0.8, prompt_normalizer: PromptNormalizer | None = None, frequency_weight: float = 0.5, reward_penalty: float = 0.1, minimum_reward: float = 0.2, non_leaf_node_probability: float = 0.1, batch_size: int = 10, target_jailbreak_goal_count: int = 1)[source]#

Bases: PromptGeneratorStrategy[FuzzerContext, FuzzerResult]

Implementation of the Fuzzer prompt generation strategy using Monte Carlo Tree Search (MCTS).

The Fuzzer generates diverse jailbreak prompts by systematically exploring and generating prompt templates. It uses MCTS to balance exploration of new templates with exploitation of promising ones, efficiently searching for effective prompt variations.

The generation flow consists of: 1. Selecting a template using MCTS-explore algorithm 2. Applying template converters to generate variations 3. Generating prompts from the selected/converted template 4. Testing prompts with the target and scoring responses 5. Updating rewards in the MCTS tree based on scores 6. Continuing until target jailbreak count reached or query limit reached

Note: While this is a prompt generator, it still requires scoring functionality to provide feedback to the MCTS algorithm for effective template selection.

Paper Reference:: GPTFUZZER - Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts Link: https://arxiv.org/pdf/2309.10253 Authors: Jiahao Yu, Xingwei Lin, Zheng Yu, Xinyu Xing GitHub: sherdencooper/GPTFuzz

__init__(*, objective_target: PromptTarget, template_converters: List[FuzzerConverter], converter_config: StrategyConverterConfig | None = None, scorer: Scorer | None = None, scoring_success_threshold: float = 0.8, prompt_normalizer: PromptNormalizer | None = None, frequency_weight: float = 0.5, reward_penalty: float = 0.1, minimum_reward: float = 0.2, non_leaf_node_probability: float = 0.1, batch_size: int = 10, target_jailbreak_goal_count: int = 1) → None[source]#

Initialize the Fuzzer prompt generation strategy.

Parameters:

objective_target (PromptTarget) – The target to send the prompts to.
template_converters (List[FuzzerConverter]) – The converters to apply on the selected jailbreak template. In each iteration, one converter is chosen at random.
converter_config (Optional[StrategyConverterConfig]) – Configuration for prompt converters. Defaults to None.
scorer (Optional[Scorer]) – Configuration for scoring responses. Defaults to None.
scoring_success_threshold (float) – The score threshold to consider a jailbreak successful.
prompt_normalizer (Optional[PromptNormalizer]) – The prompt normalizer to use. Defaults to None.
frequency_weight (float) – Constant that balances between high reward and selection frequency. Defaults to 0.5.
reward_penalty (float) – Penalty that diminishes reward as path length increases. Defaults to 0.1.
minimum_reward (float) – Minimal reward to prevent rewards from being too small. Defaults to 0.2.
non_leaf_node_probability (float) – Probability of selecting a non-leaf node. Defaults to 0.1.
batch_size (int) – The (max) batch size for sending prompts. Defaults to 10.
target_jailbreak_goal_count (int) – Target number of jailbreaks to find. Defaults to 1.

Raises:

ValueError – If required parameters are invalid or missing.

Methods

`__init__`(*, objective_target, ...[, ...])	Initialize the Fuzzer prompt generation strategy.
`execute_async`()	Execute the Fuzzer generation strategy asynchronously.
`execute_with_context_async`(*, context)	Execute strategy with complete lifecycle management.
`get_identifier`()	Get a serializable identifier for the strategy instance.
`with_default_scorer`(*, objective_target, ...)	Create a FuzzerGenerator instance with default scoring configuration.

async execute_async(*, prompts: List[str], prompt_templates: List[str], max_query_limit: int | None = None, memory_labels: dict[str, str] | None = None, **kwargs) → FuzzerResult[source]#

async execute_async(**kwargs) → FuzzerResult

Execute the Fuzzer generation strategy asynchronously.

Parameters:

prompts (List[str]) – The list of prompts to use for generation.
prompt_templates (List[str]) – The list of prompt templates to use.
max_query_limit (Optional[int]) – The maximum number of queries to execute.
memory_labels (Optional[dict[str, str]]) – Optional labels to apply to the prompts.
**kwargs – Additional keyword arguments.

Returns:

The result of the asynchronous execution.

Return type:

FuzzerResult

static with_default_scorer(*, objective_target: PromptTarget, template_converters: List[FuzzerConverter], scoring_target: PromptChatTarget, converter_config: StrategyConverterConfig | None = None, prompt_normalizer: PromptNormalizer | None = None, frequency_weight: float = 0.5, reward_penalty: float = 0.1, minimum_reward: float = 0.2, non_leaf_node_probability: float = 0.1, batch_size: int = 10, target_jailbreak_goal_count: int = 1) → FuzzerGenerator[source]#

Create a FuzzerGenerator instance with default scoring configuration.

This factory method creates a FuzzerGenerator with a default scoring setup using SelfAskScaleScorer with the Tree of Attacks scale, wrapped in a FloatScaleThresholdScorer with a threshold of 0.8.

To use the returned generator, create a FuzzerContext with prompts and prompt_templates, then pass it to execute_with_context_async().

Parameters:

objective_target (PromptTarget) – The target to send the prompts to.
template_converters (List[FuzzerConverter]) – The converters to apply on the selected jailbreak template.
scoring_target (PromptChatTarget) – The chat target to use for scoring responses.
converter_config (Optional[StrategyConverterConfig]) – Configuration for prompt converters.
prompt_normalizer (Optional[PromptNormalizer]) – The prompt normalizer to use.
frequency_weight (float) – Constant that balances between high reward and selection frequency.
reward_penalty (float) – Penalty that diminishes reward as path length increases.
minimum_reward (float) – Minimal reward to prevent rewards from being too small.
non_leaf_node_probability (float) – Probability of selecting a non-leaf node.
batch_size (int) – The (max) batch size for sending prompts.
target_jailbreak_goal_count (int) – Target number of jailbreaks to find.

Returns:

A configured FuzzerGenerator instance with default scoring.

Return type:

FuzzerGenerator

pyrit.executor.promptgen.FuzzerGenerator

Contents

pyrit.executor.promptgen.FuzzerGenerator#