pyrit.score.ObjectiveScorerEvaluator#
- class ObjectiveScorerEvaluator(scorer: Scorer)[source]#
Bases:
ScorerEvaluatorA class that evaluates an objective scorer against HumanLabeledDatasets of type OBJECTIVE.
- __init__(scorer: Scorer)#
Initialize the ScorerEvaluator with a scorer.
- Parameters:
scorer (Scorer) – The scorer to evaluate.
Methods
__init__(scorer)Initialize the ScorerEvaluator with a scorer.
from_scorer(scorer[, metrics_type])Create a ScorerEvaluator based on the type of scoring.
get_scorer_metrics(dataset_name)Retrieve the scorer metrics for a given dataset name.
run_evaluation_async(labeled_dataset[, ...])Evaluate the scorer against a HumanLabeledDataset of type OBJECTIVE.
run_evaluation_from_csv_async(csv_path, ...)Run evaluation from a CSV file containing objective-labeled data.
- get_scorer_metrics(dataset_name: str) ObjectiveScorerMetrics[source]#
Retrieve the scorer metrics for a given dataset name.
- Parameters:
dataset_name (str) – The name of the dataset to retrieve metrics for.
- Returns:
The metrics for the specified dataset.
- Return type:
- Raises:
FileNotFoundError – If metrics for the dataset are not found in any expected location.
- async run_evaluation_async(labeled_dataset: HumanLabeledDataset, num_scorer_trials: int = 1, save_results: bool = True, csv_path: str | Path | None = None) ObjectiveScorerMetrics[source]#
Evaluate the scorer against a HumanLabeledDataset of type OBJECTIVE. If save_results is True, the evaluation metrics and CSV file containing the LLM-produced scores across all trials will be saved in the ‘dataset/score/scorer_evals/objective’ directory based on the name of the HumanLabeledDataset.
- Parameters:
labeled_dataset (HumanLabeledDataset) – The HumanLabeledDataset to evaluate against.
num_scorer_trials (int) – The number of trials to run the scorer on all responses. Defaults to 1.
save_results (bool) – Whether to save the metrics and model scoring results. Defaults to True.
csv_path (Optional[Union[str, Path]]) – The path to the CSV file to save results to. Defaults to None.
- Returns:
The metrics for the objective scorer.
- Return type:
- Raises:
ValueError – If the HumanLabeledDataset is not of type OBJECTIVE or contains invalid entries.
- async run_evaluation_from_csv_async(csv_path: str | Path, human_label_col_names: List[str], objective_or_harm_col_name: str, assistant_response_col_name: str = 'assistant_response', assistant_response_data_type_col_name: str | None = None, num_scorer_trials: int = 1, save_results: bool = True, dataset_name: str | None = None) ObjectiveScorerMetrics[source]#
Run evaluation from a CSV file containing objective-labeled data.
- Parameters:
csv_path (Union[str, Path]) – The path to the CSV file containing the labeled data.
human_label_col_names (List[str]) – The names of the columns containing human labels.
objective_or_harm_col_name (str) – The name of the column containing the objective description.
assistant_response_col_name (str) – The name of the column containing assistant responses. Defaults to “assistant_response”.
assistant_response_data_type_col_name (Optional[str]) – The name of the column containing the assistant response data type. Defaults to None.
num_scorer_trials (int) – The number of trials to run for scoring. Defaults to 1.
save_results (bool) – Whether to save the evaluation results. Defaults to True.
dataset_name (Optional[str]) – The name of the dataset. Defaults to None.
- Returns:
The metrics from the evaluation.
- Return type: