pyrit.score.ObjectiveScorerEvaluator#
- class ObjectiveScorerEvaluator(scorer: Scorer)[source]#
Bases:
ScorerEvaluatorA class that evaluates an objective scorer against HumanLabeledDatasets of type OBJECTIVE.
- __init__(scorer: Scorer)#
Initialize the ScorerEvaluator with a scorer.
- Parameters:
scorer (Scorer) – The scorer to evaluate.
Methods
__init__(scorer)Initialize the ScorerEvaluator with a scorer.
from_scorer(scorer[, metrics_type])Create a ScorerEvaluator based on the type of scoring.
get_scorer_metrics(dataset_name)Retrieve the scorer metrics for a given dataset name.
run_evaluation_async(labeled_dataset[, ...])Evaluate the scorer against a HumanLabeledDataset of type OBJECTIVE.
run_evaluation_from_csv_async(csv_path, ...)Evaluate an objective scorer against a CSV dataset containing binary labels (0 or 1).
- get_scorer_metrics(dataset_name: str) ObjectiveScorerMetrics[source]#
Retrieve the scorer metrics for a given dataset name.
- Parameters:
dataset_name (str) – The name of the dataset to retrieve metrics for.
- Returns:
The metrics for the specified dataset.
- Return type:
- Raises:
FileNotFoundError – If metrics for the dataset are not found in any expected location.
- async run_evaluation_async(labeled_dataset: HumanLabeledDataset, num_scorer_trials: int = 1, save_results: bool = True, csv_path: str | Path | None = None) ObjectiveScorerMetrics[source]#
Evaluate the scorer against a HumanLabeledDataset of type OBJECTIVE. If save_results is True, the evaluation metrics and CSV file containing the LLM-produced scores across all trials will be saved in the ‘dataset/score/scorer_evals/objective’ directory based on the name of the HumanLabeledDataset.
- Parameters:
labeled_dataset (HumanLabeledDataset) – The HumanLabeledDataset to evaluate against.
num_scorer_trials (int) – The number of trials to run the scorer on all responses. Defaults to 1.
save_results (bool) – Whether to save the metrics and model scoring results. Defaults to True.
csv_path (Optional[Union[str, Path]]) – The path to the CSV file to save results to. Defaults to None.
- Returns:
The metrics for the objective scorer.
- Return type:
- Raises:
ValueError – If the HumanLabeledDataset is not of type OBJECTIVE or contains invalid entries.
- async run_evaluation_from_csv_async(csv_path: str | Path, human_label_col_names: List[str], objective_or_harm_col_name: str, assistant_response_col_name: str = 'assistant_response', assistant_response_data_type_col_name: str | None = None, num_scorer_trials: int = 1, save_results: bool = True, dataset_name: str | None = None, version: str | None = None) ObjectiveScorerMetrics[source]#
Evaluate an objective scorer against a CSV dataset containing binary labels (0 or 1).
See base class for full parameter documentation.
- Returns:
Metrics including accuracy, F1 score, precision, and recall.
- Return type: