pyrit.score.ScorerEvaluator

pyrit.score.ScorerEvaluator#

class ScorerEvaluator(scorer: Scorer)[source]#

Bases: ABC

A class that evaluates an LLM scorer against HumanLabeledDatasets, calculating appropriate metrics and saving them to a file.

__init__(scorer: Scorer)[source]#

Initialize the ScorerEvaluator with a scorer.

Parameters:: scorer (Scorer) – The scorer to evaluate.

Methods

`__init__`(scorer)	Initialize the ScorerEvaluator with a scorer.
`from_scorer`(scorer[, metrics_type])	Create a ScorerEvaluator based on the type of scoring.
`get_scorer_metrics`(dataset_name)	Get the metrics for the scorer in the 'dataset/score/scorer_evals' directory based on the dataset name.
`run_evaluation_async`(labeled_dataset[, ...])	Run the evaluation for the scorer/policy combination on the passed in HumanLabeledDataset.
`run_evaluation_from_csv_async`(csv_path, ...)	Run the evaluation for the scorer/policy combination on the passed in CSV file.

classmethod from_scorer(scorer: Scorer, metrics_type: MetricsType | None = None) → ScorerEvaluator[source]#

Create a ScorerEvaluator based on the type of scoring.

Parameters:

scorer (Scorer) – The scorer to evaluate.
metrics_type (MetricsType) – The type of scoring, either HARM or OBJECTIVE. If not provided, it will default to OBJECTIVE for true/false scorers and HARM for all other scorers.

Returns:

An instance of HarmScorerEvaluator or ObjectiveScorerEvaluator.

Return type:

ScorerEvaluator

abstract get_scorer_metrics(dataset_name: str) → ScorerMetrics[source]#

Get the metrics for the scorer in the ‘dataset/score/scorer_evals’ directory based on the dataset name.

Parameters:: dataset_name (str) – The name of the HumanLabeledDataset on which evaluation was run.
Returns:: The metrics for the scorer.
Return type:: ScorerMetrics

abstract async run_evaluation_async(labeled_dataset: HumanLabeledDataset, num_scorer_trials: int = 1, save_results: bool = True) → ScorerMetrics[source]#

Run the evaluation for the scorer/policy combination on the passed in HumanLabeledDataset.

Parameters:

labeled_dataset (HumanLabeledDataset) – The HumanLabeledDataset to evaluate the scorer against.
num_scorer_trials (int) – The number of trials to run the scorer on all responses.
save_results (bool) – Whether to save the metrics in a JSON file and the model score(s) for each response in a CSV file. Defaults to True.

Returns:

The metrics for the scorer. This will be either HarmScorerMetrics or ObjectiveScorerMetrics: depending on the type of the HumanLabeledDataset (HARM or OBJECTIVE).

Return type:

ScorerMetrics

abstract async run_evaluation_from_csv_async(csv_path: str | Path, human_label_col_names: List[str], objective_or_harm_col_name: str, assistant_response_col_name: str = 'assistant_response', assistant_response_data_type_col_name: str | None = None, num_scorer_trials: int = 1, save_results: bool = True, dataset_name: str | None = None, version: str | None = None) → ScorerMetrics[source]#

Run the evaluation for the scorer/policy combination on the passed in CSV file.

Parameters:

csv_path (str) – The path to the CSV file, which will be used to construct the HumanLabeledDataset object.
assistant_response_col_name (str) – The name of the column in the CSV file that contains the assistant responses. Defaults to “assistant_response”.
human_label_col_names (List[str]) – The names of the columns in the CSV file that contain the human labels.
objective_or_harm_col_name (str) – The name of the column in the CSV file that contains the objective or harm category associated with each response.
assistant_response_data_type_col_name (str, Optional) – The name of the column containing the data type of the assistant responses. If not specified, it is assumed that the responses are text.
num_scorer_trials (int) – The number of trials to run the scorer on all responses.
save_results (bool) – Whether to save the metrics in a JSON file and the model score(s) for each response in a CSV file. Defaults to True.
dataset_name (str, Optional) – The name of the dataset. If not provided, it will be inferred from the CSV file name. This is used to inform the name of the metrics file and model scoring results CSV to save in the ‘scorer_evals’ directory.
version (str, Optional) – The version of the dataset. If not provided, it will be inferred from the CSV file if a version comment line “# version=” is present. See mini_hate_speech.csv for an example.

Returns:

The metrics for the scorer. HarmScorerMetrics or ObjectiveScorerMetrics depending on the: ScorerEvaluator type.

Return type:

ScorerMetrics

pyrit.score.ScorerEvaluator

Contents

pyrit.score.ScorerEvaluator#