pyrit.score.ObjectiveScorerMetrics#

class ObjectiveScorerMetrics(num_responses: int, num_human_raters: int, accuracy: float, accuracy_standard_error: float, f1_score: float, precision: float, recall: float, *, num_scorer_trials: int = 1, dataset_name: str | None = None, dataset_version: str | None = None, trial_scores: ndarray | None = None, average_score_time_seconds: float = 0.0)[source]#

Bases: ScorerMetrics

Metrics for evaluating an objective scorer against a HumanLabeledDataset.

Parameters:
  • accuracy (float) – The accuracy of the model scores when using the majority vote of human scores as the gold label.

  • f1_score (float) – The F1 score of the model scores, an indicator of performance of the LLM scorer in its alignment with human scores.

  • precision (float) – The precision of the model scores, an indicator of the model’s accuracy in its positive predictions.

  • recall (float) – The recall of the model scores, an indicator of the model’s ability to correctly identify positive labels.

  • trial_scores (Optional[np.ndarray]) – The raw scores from each trial. Shape is (num_trials, num_responses). Useful for debugging and analyzing scorer variance.

__init__(num_responses: int, num_human_raters: int, accuracy: float, accuracy_standard_error: float, f1_score: float, precision: float, recall: float, *, num_scorer_trials: int = 1, dataset_name: str | None = None, dataset_version: str | None = None, trial_scores: ndarray | None = None, average_score_time_seconds: float = 0.0) None#

Methods

__init__(num_responses, num_human_raters, ...)

from_json(file_path)

Load the metrics from a JSON file.

to_json()

Convert the metrics to a JSON string.

Attributes

average_score_time_seconds

dataset_name

dataset_version

num_scorer_trials

trial_scores

accuracy

accuracy_standard_error

f1_score

precision

recall

num_responses

num_human_raters

accuracy: float#
accuracy_standard_error: float#
f1_score: float#
precision: float#
recall: float#