Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

pyrit.executor.benchmark

Benchmark modules.

FairnessBiasBenchmark

Bases: Strategy[FairnessBiasBenchmarkContext, AttackResult]

Strategy for executing fairness bias benchmarks with required scorer.

This strategy evaluates target models for fairness and bias by:

  1. Generating stories about different subjects (professions, roles, characters, etc.)

  2. Extracting names from the generated content

  3. Scoring responses for bias using a required scorer instance

  4. Tracking results for bias analysis and reporting

The benchmark requires:

Constructor Parameters:

ParameterTypeDescription
objective_targetPromptTargetThe target model to test for fairness and bias
attack_converter_configOptional[AttackConverterConfig]Optional configuration for attack converters Defaults to None.
attack_scoring_configOptional[AttackScoringConfig]Optional configuration for attack scoring Defaults to None.
prompt_normalizerOptional[PromptNormalizer]Optional normalizer for prompt processing Defaults to None.
max_attempts_on_failureintMaximum number of retry attempts on failure (default: 0) Defaults to 0.

Methods:

execute_async

execute_async(kwargs: Any = {}) → AttackResult

Execute the benchmark strategy asynchronously with the provided parameters.

ParameterTypeDescription
**kwargsAnyKeyword arguments containing: subject (str): The subject to test (profession, role, character, etc.) story_type (str): The type of story to generate num_experiments (int, optional): Number of experiments to run (default: 1) objective (str, optional): Custom objective prompt (default: auto-generated) prepended_conversation (List[Message], optional): Context conversation memory_labels (Dict[str, str], optional): Labels for memory tracking Defaults to {}.

Returns:

get_experiment_summary

get_experiment_summary(context: FairnessBiasBenchmarkContext) → dict[str, Any]

Get a summary of the experiment results.

ParameterTypeDescription
contextFairnessBiasBenchmarkContextThe benchmark context containing experiment results

Returns:

get_last_context

get_last_context() → Optional[FairnessBiasBenchmarkContext]

Get the context from the last execution.

Returns:

FairnessBiasBenchmarkContext

Bases: StrategyContext

Context for fairness bias benchmark execution.

QuestionAnsweringBenchmark

Bases: Strategy[QuestionAnsweringBenchmarkContext, AttackResult]

Strategy for executing question answering benchmarks.

This strategy evaluates target models on multiple choice questions by:

  1. Formatting questions with their choices into prompts

  2. Sending prompts to the target model via PromptSendingAttack

  3. Evaluating responses using configured scorers

  4. Tracking success/failure for benchmark reporting

Constructor Parameters:

ParameterTypeDescription
objective_targetPromptTargetThe target system to evaluate.
attack_converter_configOptional[AttackConverterConfig]Configuration for prompt converters. Defaults to None.
attack_scoring_configOptional[AttackScoringConfig]Configuration for scoring components. Defaults to None.
prompt_normalizerOptional[PromptNormalizer]Normalizer for handling prompts. Defaults to None.
objective_format_stringstrFormat string for objectives sent to scorers. Defaults to _DEFAULT_OBJECTIVE_FORMAT.
question_asking_format_stringstrFormat string for questions sent to target. Defaults to _DEFAULT_QUESTION_FORMAT.
options_format_stringstrFormat string for formatting answer choices. Defaults to _DEFAULT_OPTIONS_FORMAT.
max_attempts_on_failureintMaximum number of attempts on failure. Defaults to 0.

Methods:

execute_async

execute_async(kwargs: Any = {}) → AttackResult

Execute the QA benchmark strategy asynchronously with the provided parameters.

ParameterTypeDescription
question_answering_entryQuestionAnsweringEntryThe question answering entry to evaluate.
prepended_conversationOptional[List[Message]]Conversation to prepend.
memory_labelsOptional[Dict[str, str]]Memory labels for the benchmark context.
**kwargsAnyAdditional parameters for the benchmark. Defaults to {}.

Returns:

QuestionAnsweringBenchmarkContext

Bases: StrategyContext

Context for question answering benchmark execution.