pyrit.score.HarmScorerMetrics#
- class HarmScorerMetrics(num_responses: int, num_human_raters: int, mean_absolute_error: float, mae_standard_error: float, t_statistic: float, p_value: float, krippendorff_alpha_combined: float, krippendorff_alpha_humans: float | None = None, krippendorff_alpha_model: float | None = None, *, num_scorer_trials: int = 1, dataset_name: str | None = None, dataset_version: str | None = None, trial_scores: ndarray | None = None, average_score_time_seconds: float = 0.0, harm_category: str | None = None, harm_definition: str | None = None, harm_definition_version: str | None = None)[source]#
Bases:
ScorerMetricsMetrics for evaluating a harm scorer against a HumanLabeledDataset.
- Parameters:
mean_absolute_error (float) – The mean absolute error between the model scores and the gold scores.
mae_standard_error (float) – The standard error of the mean absolute error. This can be used to calculate a confidence interval for the mean absolute error.
t_statistic (float) – The t-statistic for the one-sample t-test comparing model scores to human scores with a null hypothesis that the mean difference is 0. A high positive t-statistic (along with a low p-value) indicates that the model scores are typically higher than the human scores.
p_value (float) – The p-value for the one-sample t-test above. It represents the probability of obtaining a difference in means as extreme as the observed difference, assuming the null hypothesis is true.
krippendorff_alpha_combined (float) – Krippendorff’s alpha for the reliability data, which includes both human and model scores. This measures the agreement between all the human raters and model scoring trials and ranges between -1.0 to 1.0 where 1.0 indicates perfect agreement, 0.0 indicates no agreement, and negative values indicate systematic disagreement.
harm_category (str, optional) – The harm category being evaluated (e.g., “hate_speech”, “violence”).
harm_definition (str, optional) – Path to the YAML file containing the harm definition (scale descriptions). Use get_harm_definition() to load the full HarmDefinition object.
harm_definition_version (str, optional) – Version of the harm definition YAML file that the human labels were created against. Used for reproducibility and to ensure scoring criteria consistency.
krippendorff_alpha_humans (float, Optional) – Krippendorff’s alpha for human scores, if there are multiple human raters. This measures the agreement between human raters.
krippendorff_alpha_model (float, Optional) – Krippendorff’s alpha for model scores, if there are multiple model scoring trials. This measures the agreement between model scoring trials.
- __init__(num_responses: int, num_human_raters: int, mean_absolute_error: float, mae_standard_error: float, t_statistic: float, p_value: float, krippendorff_alpha_combined: float, krippendorff_alpha_humans: float | None = None, krippendorff_alpha_model: float | None = None, *, num_scorer_trials: int = 1, dataset_name: str | None = None, dataset_version: str | None = None, trial_scores: ndarray | None = None, average_score_time_seconds: float = 0.0, harm_category: str | None = None, harm_definition: str | None = None, harm_definition_version: str | None = None) None#
Methods
__init__(num_responses, num_human_raters, ...)from_json(file_path)Load the metrics from a JSON file.
Load and return the HarmDefinition object for this metrics instance.
to_json()Convert the metrics to a JSON string.
Attributes
average_score_time_secondsdataset_namedataset_versionnum_scorer_trialstrial_scoresnum_responsesnum_human_raters- get_harm_definition() 'HarmDefinition' | None[source]#
Load and return the HarmDefinition object for this metrics instance.
Loads the harm definition YAML file specified in harm_definition and returns it as a HarmDefinition object. The result is cached after the first load.
- Returns:
- The loaded harm definition object, or None if
harm_definition is not set.
- Return type:
- Raises:
FileNotFoundError – If the harm definition file does not exist.
ValueError – If the harm definition file is invalid.