pyrit.score.ObjectiveScorerEvaluator#

class ObjectiveScorerEvaluator(scorer: Scorer)[source]#

Bases: ScorerEvaluator

A class that evaluates an objective scorer against HumanLabeledDatasets of type OBJECTIVE.

__init__(scorer: Scorer)#

Initialize the ScorerEvaluator with a scorer.

Parameters:

scorer (Scorer) – The scorer to evaluate.

Methods

__init__(scorer)

Initialize the ScorerEvaluator with a scorer.

from_scorer(scorer[, metrics_type])

Create a ScorerEvaluator based on the type of scoring.

get_scorer_metrics(dataset_name)

Retrieve the scorer metrics for a given dataset name.

run_evaluation_async(labeled_dataset[, ...])

Evaluate the scorer against a HumanLabeledDataset of type OBJECTIVE.

run_evaluation_from_csv_async(csv_path, ...)

Run evaluation from a CSV file containing objective-labeled data.

get_scorer_metrics(dataset_name: str) ObjectiveScorerMetrics[source]#

Retrieve the scorer metrics for a given dataset name.

Parameters:

dataset_name (str) – The name of the dataset to retrieve metrics for.

Returns:

The metrics for the specified dataset.

Return type:

ObjectiveScorerMetrics

Raises:

FileNotFoundError – If metrics for the dataset are not found in any expected location.

async run_evaluation_async(labeled_dataset: HumanLabeledDataset, num_scorer_trials: int = 1, save_results: bool = True, csv_path: str | Path | None = None) ObjectiveScorerMetrics[source]#

Evaluate the scorer against a HumanLabeledDataset of type OBJECTIVE. If save_results is True, the evaluation metrics and CSV file containing the LLM-produced scores across all trials will be saved in the ‘dataset/score/scorer_evals/objective’ directory based on the name of the HumanLabeledDataset.

Parameters:
  • labeled_dataset (HumanLabeledDataset) – The HumanLabeledDataset to evaluate against.

  • num_scorer_trials (int) – The number of trials to run the scorer on all responses. Defaults to 1.

  • save_results (bool) – Whether to save the metrics and model scoring results. Defaults to True.

  • csv_path (Optional[Union[str, Path]]) – The path to the CSV file to save results to. Defaults to None.

Returns:

The metrics for the objective scorer.

Return type:

ObjectiveScorerMetrics

Raises:

ValueError – If the HumanLabeledDataset is not of type OBJECTIVE or contains invalid entries.

async run_evaluation_from_csv_async(csv_path: str | Path, human_label_col_names: List[str], objective_or_harm_col_name: str, assistant_response_col_name: str = 'assistant_response', assistant_response_data_type_col_name: str | None = None, num_scorer_trials: int = 1, save_results: bool = True, dataset_name: str | None = None) ObjectiveScorerMetrics[source]#

Run evaluation from a CSV file containing objective-labeled data.

Parameters:
  • csv_path (Union[str, Path]) – The path to the CSV file containing the labeled data.

  • human_label_col_names (List[str]) – The names of the columns containing human labels.

  • objective_or_harm_col_name (str) – The name of the column containing the objective description.

  • assistant_response_col_name (str) – The name of the column containing assistant responses. Defaults to “assistant_response”.

  • assistant_response_data_type_col_name (Optional[str]) – The name of the column containing the assistant response data type. Defaults to None.

  • num_scorer_trials (int) – The number of trials to run for scoring. Defaults to 1.

  • save_results (bool) – Whether to save the evaluation results. Defaults to True.

  • dataset_name (Optional[str]) – The name of the dataset. Defaults to None.

Returns:

The metrics from the evaluation.

Return type:

ObjectiveScorerMetrics