pyrit.score.ObjectiveScorerMetrics#
- class ObjectiveScorerMetrics(num_responses: int, num_human_raters: int, accuracy: float, accuracy_standard_error: float, f1_score: float, precision: float, recall: float, *, num_scorer_trials: int = 1, dataset_name: str | None = None, dataset_version: str | None = None, trial_scores: ndarray | None = None, average_score_time_seconds: float = 0.0)[source]#
Bases:
ScorerMetricsMetrics for evaluating an objective scorer against a HumanLabeledDataset.
- Parameters:
accuracy (float) – The accuracy of the model scores when using the majority vote of human scores as the gold label.
f1_score (float) – The F1 score of the model scores, an indicator of performance of the LLM scorer in its alignment with human scores.
precision (float) – The precision of the model scores, an indicator of the model’s accuracy in its positive predictions.
recall (float) – The recall of the model scores, an indicator of the model’s ability to correctly identify positive labels.
trial_scores (Optional[np.ndarray]) – The raw scores from each trial. Shape is (num_trials, num_responses). Useful for debugging and analyzing scorer variance.
- __init__(num_responses: int, num_human_raters: int, accuracy: float, accuracy_standard_error: float, f1_score: float, precision: float, recall: float, *, num_scorer_trials: int = 1, dataset_name: str | None = None, dataset_version: str | None = None, trial_scores: ndarray | None = None, average_score_time_seconds: float = 0.0) None#
Methods
__init__(num_responses, num_human_raters, ...)from_json(file_path)Load the metrics from a JSON file.
to_json()Convert the metrics to a JSON string.
Attributes
average_score_time_secondsdataset_namedataset_versionnum_scorer_trialstrial_scoresnum_responsesnum_human_raters