pyrit.score.SelfAskCategoryScorer#

class SelfAskCategoryScorer(chat_target: PromptChatTarget, content_classifier: Path)[source]#

Bases: Scorer

A class that represents a self-ask score for text classification and scoring. Given a classifer file, it scores according to these categories and returns the category the PromptRequestPiece fits best.

There is also a false category that is used if the promptrequestpiece does not fit any of the categories.

__init__(chat_target: PromptChatTarget, content_classifier: Path) None[source]#

Initializes a new instance of the SelfAskCategoryScorer class.

Parameters:
  • chat_target (PromptChatTarget) – The chat target to interact with.

  • content_classifier (Path) – The path to the classifier file.

Methods

__init__(chat_target, content_classifier)

Initializes a new instance of the SelfAskCategoryScorer class.

get_identifier()

Returns an identifier dictionary for the scorer.

scale_value_float(value, min_value, max_value)

Scales a value from 0 to 1 based on the given min and max values.

score_async(request_response, *[, task])

Scores the given request_response using the chat target and adds score to memory.

score_image_async(image_path, *[, task])

Scores the given image using the chat target.

score_prompts_batch_async(*, request_responses)

score_text_async(text, *[, task])

Scores the given text based on the task using the chat target.

validate(request_response, *[, task])

Validates the request_response piece to score.

Attributes

scorer_type

async score_async(request_response: PromptRequestPiece, *, task: str | None = None) list[Score][source]#

Scores the given request_response using the chat target and adds score to memory.

Parameters:
  • request_response (PromptRequestPiece) – The prompt request piece to score.

  • task (str) – The task based on which the text should be scored (the original attacker model’s objective). Currently not supported for this scorer.

Returns:

The request_response scored.

The category that fits best in the response is used for score_category. The score_value is True in all cases unless no category fits. In which case, the score value is false and the _false_category is used.

Return type:

list[Score]

validate(request_response: PromptRequestPiece, *, task: str | None = None)[source]#

Validates the request_response piece to score. Because some scorers may require specific PromptRequestPiece types or values.

Parameters:
  • request_response (PromptRequestPiece) – The request response to be validated.

  • task (str) – The task based on which the text should be scored (the original attacker model’s objective).