pyrit.orchestrator.FuzzerOrchestrator#

class FuzzerOrchestrator(*, prompts: list[str], prompt_target: PromptTarget, prompt_templates: list[str], prompt_converters: list[PromptConverter] | None = None, template_converters: list[FuzzerConverter], scoring_target: PromptChatTarget, verbose: bool = False, frequency_weight: float = 0.5, reward_penalty: float = 0.1, minimum_reward: float = 0.2, non_leaf_node_probability: float = 0.1, batch_size: int = 10, target_jailbreak_goal_count: int = 1, max_query_limit: int | None = None)[source]#

Bases: Orchestrator

__init__(*, prompts: list[str], prompt_target: PromptTarget, prompt_templates: list[str], prompt_converters: list[PromptConverter] | None = None, template_converters: list[FuzzerConverter], scoring_target: PromptChatTarget, verbose: bool = False, frequency_weight: float = 0.5, reward_penalty: float = 0.1, minimum_reward: float = 0.2, non_leaf_node_probability: float = 0.1, batch_size: int = 10, target_jailbreak_goal_count: int = 1, max_query_limit: int | None = None) None[source]#

Creates an orchestrator that explores a variety of jailbreak options via fuzzing.

Paper: GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts.

Link: https://arxiv.org/pdf/2309.10253 Authors: Jiahao Yu, Xingwei Lin, Zheng Yu, Xinyu Xing GitHub: sherdencooper/GPTFuzz

Parameters:
  • prompts – The prompts will be the questions to the target.

  • prompt_target – The target to send the prompts to.

  • prompt_templates – List of all the jailbreak templates which will act as the seed pool. At each iteration, a seed will be selected using the MCTS-explore algorithm which will be sent to the shorten/expand prompt converter. The converted template along with the prompt will be sent to the target.

  • prompt_converters – The prompt_converters to use to convert the prompts before sending them to the prompt target.

  • template_converters – The converters that will be applied on the jailbreak template that was selected by MCTS-explore. The converters will not be applied to the prompts. In each iteration of the algorithm, one converter is chosen at random.

  • verbose – Whether to print debug information.

  • frequency_weight – constant that balances between the seed with high reward and the seed that is selected fewer times.

  • reward_penalty – Reward penalty diminishes the reward for the current node and its ancestors when the path lengthens.

  • minimum_reward – Minimal reward prevents the reward of the current node and its ancestors from being too small or negative.

  • non_leaf_node_probability – parameter which decides the likelihood of selecting a non-leaf node.

  • batch_size (int, Optional) – The (max) batch size for sending prompts. Defaults to 10.

  • target_jailbreak_goal_count – target number of the jailbreaks after which the fuzzer will stop.

  • max_query_limit – Maximum number of times the fuzzer will run. By default, it calculates the product of prompts and prompt templates and multiplies it by 10. Each iteration makes as many calls as the number of prompts.

Methods

__init__(*, prompts, prompt_target, ...[, ...])

Creates an orchestrator that explores a variety of jailbreak options via fuzzing.

dispose_db_engine()

Dispose database engine to release database connections and resources.

execute_fuzzer()

Generates new templates by applying transformations to existing templates and returns successful ones.

get_identifier()

get_memory()

Retrieves the memory associated with this orchestrator.

get_score_memory()

Retrieves the scores of the PromptRequestPieces associated with this orchestrator.

async execute_fuzzer() FuzzerResult[source]#

Generates new templates by applying transformations to existing templates and returns successful ones.

This method uses the MCTS-explore algorithm to select a template in each iteration and applies a randomly chosen template converter to generate a new template.

Subsequently, it creates a set of prompts by populating instances of the new template with all the prompts.

These prompts are sent to the target and the responses scored.

A new template is considered successful if this resulted in at least one successful jailbreak which is identified by having a high enough score.

Successful templates are added to the initial list of templates and may be selected again in subsequent iterations.

Finally, rewards for all nodes are updated.

The algorithm stops when a sufficient number of jailbreaks are found with new templates or when the query limit is reached.