pyrit.score.HarmScorerMetrics#

class HarmScorerMetrics(num_responses: int, num_human_raters: int, mean_absolute_error: float, mae_standard_error: float, t_statistic: float, p_value: float, krippendorff_alpha_combined: float, krippendorff_alpha_humans: float | None = None, krippendorff_alpha_model: float | None = None, *, num_scorer_trials: int = 1, dataset_name: str | None = None, dataset_version: str | None = None, trial_scores: ndarray | None = None, average_score_time_seconds: float = 0.0, harm_category: str | None = None, harm_definition: str | None = None, harm_definition_version: str | None = None)[source]#

Bases: ScorerMetrics

Metrics for evaluating a harm scorer against a HumanLabeledDataset.

Parameters:
  • mean_absolute_error (float) – The mean absolute error between the model scores and the gold scores.

  • mae_standard_error (float) – The standard error of the mean absolute error. This can be used to calculate a confidence interval for the mean absolute error.

  • t_statistic (float) – The t-statistic for the one-sample t-test comparing model scores to human scores with a null hypothesis that the mean difference is 0. A high positive t-statistic (along with a low p-value) indicates that the model scores are typically higher than the human scores.

  • p_value (float) – The p-value for the one-sample t-test above. It represents the probability of obtaining a difference in means as extreme as the observed difference, assuming the null hypothesis is true.

  • krippendorff_alpha_combined (float) – Krippendorff’s alpha for the reliability data, which includes both human and model scores. This measures the agreement between all the human raters and model scoring trials and ranges between -1.0 to 1.0 where 1.0 indicates perfect agreement, 0.0 indicates no agreement, and negative values indicate systematic disagreement.

  • harm_category (str, optional) – The harm category being evaluated (e.g., “hate_speech”, “violence”).

  • harm_definition (str, optional) – Path to the YAML file containing the harm definition (scale descriptions). Use get_harm_definition() to load the full HarmDefinition object.

  • harm_definition_version (str, optional) – Version of the harm definition YAML file that the human labels were created against. Used for reproducibility and to ensure scoring criteria consistency.

  • krippendorff_alpha_humans (float, Optional) – Krippendorff’s alpha for human scores, if there are multiple human raters. This measures the agreement between human raters.

  • krippendorff_alpha_model (float, Optional) – Krippendorff’s alpha for model scores, if there are multiple model scoring trials. This measures the agreement between model scoring trials.

__init__(num_responses: int, num_human_raters: int, mean_absolute_error: float, mae_standard_error: float, t_statistic: float, p_value: float, krippendorff_alpha_combined: float, krippendorff_alpha_humans: float | None = None, krippendorff_alpha_model: float | None = None, *, num_scorer_trials: int = 1, dataset_name: str | None = None, dataset_version: str | None = None, trial_scores: ndarray | None = None, average_score_time_seconds: float = 0.0, harm_category: str | None = None, harm_definition: str | None = None, harm_definition_version: str | None = None) None#

Methods

__init__(num_responses, num_human_raters, ...)

from_json(file_path)

Load the metrics from a JSON file.

get_harm_definition()

Load and return the HarmDefinition object for this metrics instance.

to_json()

Convert the metrics to a JSON string.

Attributes

average_score_time_seconds

dataset_name

dataset_version

harm_category

harm_definition

harm_definition_version

krippendorff_alpha_humans

krippendorff_alpha_model

num_scorer_trials

trial_scores

mean_absolute_error

mae_standard_error

t_statistic

p_value

krippendorff_alpha_combined

num_responses

num_human_raters

get_harm_definition() 'HarmDefinition' | None[source]#

Load and return the HarmDefinition object for this metrics instance.

Loads the harm definition YAML file specified in harm_definition and returns it as a HarmDefinition object. The result is cached after the first load.

Returns:

The loaded harm definition object, or None if

harm_definition is not set.

Return type:

HarmDefinition

Raises:
harm_category: str | None = None#
harm_definition: str | None = None#
harm_definition_version: str | None = None#
krippendorff_alpha_combined: float#
krippendorff_alpha_humans: float | None = None#
krippendorff_alpha_model: float | None = None#
mae_standard_error: float#
mean_absolute_error: float#
p_value: float#
t_statistic: float#