pyrit.score.HumanInTheLoopScorer#
- class HumanInTheLoopScorer(*, scorer: Scorer = None, re_scorers: list[Scorer] = None)[source]#
Bases:
Scorer
Create scores from manual human input and adds them to the database.
- Parameters:
Methods
__init__
(*[, scorer, re_scorers])edit_score
(existing_score, original_prompt, ...)Edit an existing score.
get_identifier
()Returns an identifier dictionary for the scorer.
get_modified_value
(original_prompt, ...[, ...])Get the modified value for the score.
import_scores_from_csv
(csv_file_path)rescore
(request_response, *[, task])scale_value_float
(value, min_value, max_value)Scales a value from 0 to 1 based on the given min and max values.
score_async
(request_response, *[, task])Score the prompt with a human in the loop.
score_image_async
(image_path, *[, task])Scores the given image using the chat target.
score_prompt_manually
(request_response, *[, ...])Manually score the prompt
score_prompts_batch_async
(*, request_responses)score_text_async
(text, *[, task])Scores the given text based on the task using the chat target.
validate
(request_response, *[, task])Validates the request_response piece to score.
Attributes
scorer_type
- edit_score(existing_score: Score, original_prompt: str, request_response: PromptRequestPiece, task: str | None) Score [source]#
Edit an existing score.
- Parameters:
existing_score (Score) – The existing score to edit.
original_prompt (str) – The original prompt.
request_response (PromptRequestPiece) – The request response to score.
task (str) – The task based on which the text should be scored (the original attacker model’s objective).
- Returns:
new score after all changes
- get_modified_value(original_prompt: str, score_value: str, field_name: str, extra_value_description: str = '') str [source]#
Get the modified value for the score.
- Parameters:
- Returns:
The value after modification or the original value if the user does not want to change it.
- async rescore(request_response: PromptRequestPiece, *, task: str | None = None) list[Score] [source]#
- async score_async(request_response: PromptRequestPiece, *, task: str | None = None) list[Score] [source]#
Score the prompt with a human in the loop.
When the HumanInTheLoopScorer is used, user is given three options to choose from for each score: (1) Proceed with scoring the prompt as is (2) Manually modify the score & associated metadata If the user chooses to manually modify the score, they are prompted to enter the new score value, score category, score value description, score rationale, and score metadata (3) Re-score the prompt If the user chooses to re-score the prompt, they are prompted to select a re-scorer from the list of re-scorers provided
If the user initializes this scorer without a scorer, they will be prompted to manually score the prompt.
- Parameters:
request_response (PromptRequestPiece) – The prompt request piece to score.
task (str) – The task based on which the text should be scored (the original attacker model’s objective).
- Returns:
The request_response scored.
- Return type:
- score_prompt_manually(request_response: PromptRequestPiece, *, task: str | None = None) list[Score] [source]#
Manually score the prompt
- Parameters:
request_response (PromptRequestPiece) – The prompt request piece to score.
task (str) – The task based on which the text should be scored (the original attacker model’s objective).
- Returns:
list of scores
- validate(request_response: PromptRequestPiece, *, task: str | None = None)[source]#
Validates the request_response piece to score. Because some scorers may require specific PromptRequestPiece types or values.
- Parameters:
request_response (PromptRequestPiece) – The request response to be validated.
task (str) – The task based on which the text should be scored (the original attacker model’s objective).