pyrit.score.Scorer#

class Scorer[source]#

Bases: ABC

Abstract base class for scorers.

__init__()#

Methods

__init__()

get_identifier()

Returns an identifier dictionary for the scorer.

get_scorer_metrics(dataset_name[, metrics_type])

Returns evaluation statistics for the scorer using the dataset_name of the human labeled dataset that this scorer was run against.

scale_value_float(value, min_value, max_value)

Scales a value from 0 to 1 based on the given min and max values.

score_async(request_response, *[, task])

Score the request_response, add the results to the database and return a list of Score objects.

score_image_async(image_path, *[, task])

Scores the given image using the chat target.

score_prompts_with_tasks_batch_async(*, ...)

score_response_async(*, response, scorers[, ...])

Score a response using multiple scorers in parallel.

score_response_select_first_success_async(*, ...)

Score response pieces sequentially until finding a successful score.

score_response_with_objective_async(*, response)

Score a response using both auxiliary and objective scorers.

score_responses_inferring_tasks_batch_async(*, ...)

Scores a batch of responses (ignores non-assistant messages).

score_text_async(text, *[, task])

Scores the given text based on the task using the chat target.

score_text_batch_async(*, texts[, tasks, ...])

validate(request_response, *[, task])

Validates the request_response piece to score.

Attributes

get_identifier()[source]#

Returns an identifier dictionary for the scorer.

Returns:

The identifier dictionary.

Return type:

dict

get_scorer_metrics(dataset_name: str, metrics_type: MetricsType | None = None)[source]#

Returns evaluation statistics for the scorer using the dataset_name of the human labeled dataset that this scorer was run against. If you did not evaluate the scorer against your own human labeled dataset, you can use this method to retrieve metrics based on a pre-existing dataset name, which is often a ‘harm_category’ or abbreviated version of the ‘objective’. For example, to retrieve metrics for the ‘hate_speech’ harm, you would pass ‘hate_speech’ as the dataset_name.

The existing metrics can be found in the ‘dataset/score/scorer_evals’ directory within either the ‘harm’ or ‘objective’ subdirectory.

Parameters:
  • dataset_name (str) – The name of the dataset on which the scorer evaluation was run. This is used to inform the name of the metrics file to read in the scorer_evals directory.

  • metrics_type (MetricsType, optional) – The type of metrics to retrieve, either HARM or OBJECTIVE. If not provided, it will default to OBJECTIVE for true/false scorers and HARM for all other scorers.

Returns:

A ScorerMetrics object containing the saved evaluation statistics for the scorer.

Return type:

ScorerMetrics

scale_value_float(value: float, min_value: float, max_value: float) float[source]#

Scales a value from 0 to 1 based on the given min and max values. E.g. 3 stars out of 5 stars would be .5.

Parameters:
  • value (float) – The value to be scaled.

  • min_value (float) – The minimum value of the range.

  • max_value (float) – The maximum value of the range.

Returns:

The scaled value.

Return type:

float

abstract async score_async(request_response: PromptRequestPiece, *, task: str | None = None) list[Score][source]#

Score the request_response, add the results to the database and return a list of Score objects.

Parameters:
  • request_response (PromptRequestPiece) – The request response to be scored.

  • task (str) – The task based on which the text should be scored (the original attacker model’s objective).

Returns:

A list of Score objects representing the results.

Return type:

list[Score]

async score_image_async(image_path: str, *, task: str | None = None) list[Score][source]#

Scores the given image using the chat target.

Parameters:
  • text (str) – The image to be scored.

  • task (str) – The task based on which the text should be scored (the original attacker model’s objective).

Returns:

A list of Score objects representing the results.

Return type:

list[Score]

async score_prompts_with_tasks_batch_async(*, request_responses: Sequence[PromptRequestPiece], tasks: Sequence[str], batch_size: int = 10) list[Score][source]#
async static score_response_async(*, response: PromptRequestResponse, scorers: List[Scorer], role_filter: Literal['system', 'user', 'assistant'] = 'assistant', task: str | None = None, skip_on_error: bool = True) List[Score][source]#

Score a response using multiple scorers in parallel.

This method runs all scorers on all filtered response pieces concurrently for maximum performance. Typically used for auxiliary scoring where all results are needed but not returned.

Parameters:
  • response – PromptRequestResponse containing pieces to score

  • scorers – List of scorers to apply

  • role_filter – Only score pieces with this role (default: “assistant”)

  • task – Optional task description for scoring context

  • skip_on_error – If True, skip scoring pieces that have errors (default: True)

Returns:

List of all scores from all scorers

async static score_response_select_first_success_async(*, response: PromptRequestResponse, scorers: List[Scorer], role_filter: Literal['system', 'user', 'assistant'] = 'assistant', task: str | None = None, skip_on_error: bool = True) Score | None[source]#

Score response pieces sequentially until finding a successful score.

This method processes filtered response pieces one by one. For each piece, it runs all scorers in parallel, then checks the results for a successful score (where score.get_value() is truthy). If no successful score is found, it returns the first score as a failure indicator.

Parameters:
  • response – PromptRequestResponse containing pieces to score

  • scorers – List of scorers to use for evaluation

  • role_filter – Only score pieces with this role (default: “assistant”)

  • task – Optional task description for scoring context

  • skip_on_error – If True, skip scoring pieces that have errors (default: True)

Returns:

The first successful score, or the first score if no success found, or None if no scores

async static score_response_with_objective_async(*, response: PromptRequestResponse, auxiliary_scorers: List[Scorer] | None = None, objective_scorers: List[Scorer] | None = None, role_filter: Literal['system', 'user', 'assistant'] = 'assistant', task: str | None = None, skip_on_error: bool = True) Dict[str, List[Score]][source]#

Score a response using both auxiliary and objective scorers.

This method runs auxiliary scorers for collecting metrics and objective scorers for determining success. All scorers are run asynchronously for performance.

Parameters:
  • response (PromptRequestResponse) – Response containing pieces to score

  • auxiliary_scorers (Optional[List[Scorer]]) – List of auxiliary scorers to apply

  • objective_scorers (Optional[List[Scorer]]) – List of objective scorers to apply

  • role_filter (ChatMessageRole) – Only score pieces with this role (default: assistant)

  • task (Optional[str]) – Optional task description for scoring context

  • skip_on_error (bool) – If True, skip scoring pieces that have errors (default: True)

Returns:

Dictionary with keys auxiliary_scores and objective_scores

containing lists of scores from each type of scorer.

Return type:

Dict[str,List[Score]]

async score_responses_inferring_tasks_batch_async(*, request_responses: Sequence[PromptRequestPiece], batch_size: int = 10) list[Score][source]#

Scores a batch of responses (ignores non-assistant messages).

This will send the last requests as tasks if it can. If it’s complicated (e.g. non-text) it will send None.

For more control, use score_prompts_with_tasks_batch_async

async score_text_async(text: str, *, task: str | None = None) list[Score][source]#

Scores the given text based on the task using the chat target.

Parameters:
  • text (str) – The text to be scored.

  • task (str) – The task based on which the text should be scored (the original attacker model’s objective).

Returns:

A list of Score objects representing the results.

Return type:

list[Score]

async score_text_batch_async(*, texts: Sequence[str], tasks: Sequence[str] | None = None, batch_size: int = 10) list[Score][source]#
scorer_type: Literal['true_false', 'float_scale']#
abstract validate(request_response: PromptRequestPiece, *, task: str | None = None)[source]#

Validates the request_response piece to score. Because some scorers may require specific PromptRequestPiece types or values.

Parameters:
  • request_response (PromptRequestPiece) – The request response to be validated.

  • task (str) – The task based on which the text should be scored (the original attacker model’s objective).