pyrit.score.HumanInTheLoopScorer#

class HumanInTheLoopScorer(*, scorer: Scorer | None = None, re_scorers: list[Scorer] | None = None)[source]#

Bases: Scorer

Create scores from manual human input and adds them to the database.

Parameters:
  • scorer (Scorer) – The scorer to use for the initial scoring.

  • re_scorers (list[Scorer]) – The scorers to use for re-scoring.

__init__(*, scorer: Scorer | None = None, re_scorers: list[Scorer] | None = None) None[source]#

Methods

__init__(*[, scorer, re_scorers])

edit_score(existing_score, original_prompt, ...)

Edit an existing score.

get_identifier()

Returns an identifier dictionary for the scorer.

get_modified_value(original_prompt, ...[, ...])

Get the modified value for the score.

get_scorer_metrics(dataset_name[, metrics_type])

Returns evaluation statistics for the scorer using the dataset_name of the human labeled dataset that this scorer was run against.

import_scores_from_csv(csv_file_path)

rescore(request_response, *[, task])

scale_value_float(value, min_value, max_value)

Scales a value from 0 to 1 based on the given min and max values.

score_async(request_response, *[, task])

Score the request_response, add the results to the database and return a list of Score objects.

score_image_async(image_path, *[, task])

Scores the given image using the chat target.

score_prompt_manually(request_response, *[, ...])

Manually score the prompt

score_prompts_with_tasks_batch_async(*, ...)

score_response_async(*, response, scorers[, ...])

Score a response using multiple scorers in parallel.

score_response_select_first_success_async(*, ...)

Score response pieces sequentially until finding a successful score.

score_response_with_objective_async(*, response)

Score a response using both auxiliary and objective scorers.

score_responses_inferring_tasks_batch_async(*, ...)

Scores a batch of responses (ignores non-assistant messages).

score_text_async(text, *[, task])

Scores the given text based on the task using the chat target.

score_text_batch_async(*, texts[, tasks, ...])

validate(request_response, *[, task])

Validates the request_response piece to score.

Attributes

edit_score(existing_score: Score, original_prompt: str, request_response: PromptRequestPiece, task: str | None = None) Score[source]#

Edit an existing score.

Parameters:
  • existing_score (Score) – The existing score to edit.

  • original_prompt (str) – The original prompt.

  • request_response (PromptRequestPiece) – The request response to score.

  • task (str) – The task based on which the text should be scored (the original attacker model’s objective).

Returns:

new score after all changes

get_modified_value(original_prompt: str, score_value: str, field_name: str, extra_value_description: str = '') str[source]#

Get the modified value for the score.

Parameters:
  • original_prompt (str) – The original prompt.

  • score_value (str) – The existing value in the Score object.

  • field_name (str) – The name of the field to change.

  • extra_value_description (Optional str) – Extra information to show user describing the score value.

Returns:

The value after modification or the original value if the user does not want to change it.

import_scores_from_csv(csv_file_path: Path | str) list[Score][source]#
async rescore(request_response: PromptRequestPiece, *, task: str | None = None) list[Score][source]#
score_prompt_manually(request_response: PromptRequestPiece, *, task: str | None = None) list[Score][source]#

Manually score the prompt

Parameters:
  • request_response (PromptRequestPiece) – The prompt request piece to score.

  • task (str) – The task based on which the text should be scored (the original attacker model’s objective).

Returns:

list of scores

scorer_type: ScoreType#
validate(request_response: PromptRequestPiece, *, task: str | None = None)[source]#

Validates the request_response piece to score. Because some scorers may require specific PromptRequestPiece types or values.

Parameters:
  • request_response (PromptRequestPiece) – The request response to be validated.

  • task (str) – The task based on which the text should be scored (the original attacker model’s objective).