pyrit.score.HumanInTheLoopScorer

pyrit.score.HumanInTheLoopScorer#

class HumanInTheLoopScorer(*, scorer: Scorer | None = None, re_scorers: list[Scorer] | None = None)[source]#

Bases: Scorer

Create scores from manual human input and adds them to the database.

Parameters:

scorer (Scorer) – The scorer to use for the initial scoring.
re_scorers (list[Scorer]) – The scorers to use for re-scoring.

__init__(*, scorer: Scorer | None = None, re_scorers: list[Scorer] | None = None) → None[source]#

Methods

`__init__`(*[, scorer, re_scorers])
`edit_score`(existing_score, original_prompt, ...)	Edit an existing score.
`get_identifier`()	Returns an identifier dictionary for the scorer.
`get_modified_value`(original_prompt, ...[, ...])	Get the modified value for the score.
`get_scorer_metrics`(dataset_name[, metrics_type])	Returns evaluation statistics for the scorer using the dataset_name of the human labeled dataset that this scorer was run against.
`import_scores_from_csv`(csv_file_path)
`rescore`(request_response, *[, task])
`scale_value_float`(value, min_value, max_value)	Scales a value from 0 to 1 based on the given min and max values.
`score_async`(request_response, *[, task])	Score the prompt with a human in the loop.
`score_image_async`(image_path, *[, task])	Scores the given image using the chat target.
`score_prompt_manually`(request_response, *[, ...])	Manually score the prompt
`score_prompts_with_tasks_batch_async`(*, ...)
`score_response_async`(*, response, scorers[, ...])	Score a response using multiple scorers in parallel.
`score_response_select_first_success_async`(*, ...)	Score response pieces sequentially until finding a successful score.
`score_response_with_objective_async`(*, response)	Score a response using both auxiliary and objective scorers.
`score_responses_inferring_tasks_batch_async`(*, ...)	Scores a batch of responses (ignores non-assistant messages).
`score_text_async`(text, *[, task])	Scores the given text based on the task using the chat target.
`score_text_batch_async`(*, texts[, tasks, ...])
`validate`(request_response, *[, task])	Validates the request_response piece to score.

Attributes

scorer_type

edit_score(existing_score: Score, original_prompt: str, request_response: PromptRequestPiece, task: str | None = None) → Score[source]#

Edit an existing score.

Parameters:

existing_score (Score) – The existing score to edit.
original_prompt (str) – The original prompt.
request_response (PromptRequestPiece) – The request response to score.
task (str) – The task based on which the text should be scored (the original attacker model’s objective).

Returns:

new score after all changes

get_modified_value(original_prompt: str, score_value: str, field_name: str, extra_value_description: str = '') → str[source]#

Get the modified value for the score.

Parameters:

original_prompt (str) – The original prompt.
score_value (str) – The existing value in the Score object.
field_name (str) – The name of the field to change.
extra_value_description (Optional str) – Extra information to show user describing the score value.

Returns:

The value after modification or the original value if the user does not want to change it.

import_scores_from_csv(csv_file_path: Path | str) → list[Score][source]#

async rescore(request_response: PromptRequestPiece, *, task: str | None = None) → list[Score][source]#

async score_async(request_response: PromptRequestPiece, *, task: str | None = None) → list[Score][source]#

Score the prompt with a human in the loop.

When the HumanInTheLoopScorer is used, user is given three options to choose from for each score: (1) Proceed with scoring the prompt as is (2) Manually modify the score & associated metadata If the user chooses to manually modify the score, they are prompted to enter the new score value, score category, score value description, score rationale, and score metadata (3) Re-score the prompt If the user chooses to re-score the prompt, they are prompted to select a re-scorer from the list of re-scorers provided

If the user initializes this scorer without a scorer, they will be prompted to manually score the prompt.

Parameters:

request_response (PromptRequestPiece) – The prompt request piece to score.
task (str) – The task based on which the text should be scored (the original attacker model’s objective).

Returns:

The request_response scored.

Return type:

list[Score]

score_prompt_manually(request_response: PromptRequestPiece, *, task: str | None = None) → list[Score][source]#

Manually score the prompt

Parameters:

request_response (PromptRequestPiece) – The prompt request piece to score.
task (str) – The task based on which the text should be scored (the original attacker model’s objective).

Returns:

list of scores

scorer_type: ScoreType#

validate(request_response: PromptRequestPiece, *, task: str | None = None)[source]#

Validates the request_response piece to score. Because some scorers may require specific PromptRequestPiece types or values.

Parameters:

request_response (PromptRequestPiece) – The request response to be validated.
task (str) – The task based on which the text should be scored (the original attacker model’s objective).

pyrit.score.HumanInTheLoopScorer

Contents

pyrit.score.HumanInTheLoopScorer#