pyrit.score.PromptShieldScorer

pyrit.score.PromptShieldScorer#

class PromptShieldScorer(*, prompt_shield_target: ~pyrit.prompt_target.prompt_shield_target.PromptShieldTarget, validator: ~pyrit.score.scorer_prompt_validator.ScorerPromptValidator | None = None, score_aggregator: ~collections.abc.Callable[[~collections.abc.Iterable[~pyrit.models.score.Score]], ~pyrit.score.score_aggregator_result.ScoreAggregatorResult] = <function _create_aggregator.<locals>.aggregator>)[source]#

Bases: TrueFalseScorer

Returns true if an attack or jailbreak has been detected by Prompt Shield.

__init__(*, prompt_shield_target: ~pyrit.prompt_target.prompt_shield_target.PromptShieldTarget, validator: ~pyrit.score.scorer_prompt_validator.ScorerPromptValidator | None = None, score_aggregator: ~collections.abc.Callable[[~collections.abc.Iterable[~pyrit.models.score.Score]], ~pyrit.score.score_aggregator_result.ScoreAggregatorResult] = <function _create_aggregator.<locals>.aggregator>) → None[source]#

Initialize the PromptShieldScorer.

Parameters:

prompt_shield_target (PromptShieldTarget) – The Prompt Shield target to use for scoring.
validator (Optional[ScorerPromptValidator]) – Custom validator. Defaults to None.
score_aggregator (TrueFalseAggregatorFunc) – The aggregator function to use. Defaults to TrueFalseScoreAggregator.OR.

Methods

`__init__`(*, prompt_shield_target[, ...])	Initialize the PromptShieldScorer.
`evaluate_async`([file_mapping, ...])	Evaluate this scorer against human-labeled datasets.
`get_identifier`()	Get the component's identifier, building it lazily on first access.
`get_scorer_metrics`()	Get evaluation metrics for this scorer from the configured evaluation result file.
`scale_value_float`(value, min_value, max_value)	Scales a value from 0 to 1 based on the given min and max values.
`score_async`(message, *[, objective, ...])	Score the message, add the results to the database, and return a list of Score objects.
`score_image_async`(image_path, *[, objective])	Score the given image using the chat target.
`score_image_batch_async`(*, image_paths[, ...])	Score a batch of images asynchronously.
`score_prompts_batch_async`(*, messages[, ...])	Score multiple prompts in batches using the provided objectives.
`score_response_async`(*, response[, ...])	Score a response using an objective scorer and optional auxiliary scorers.
`score_response_multiple_scorers_async`(*, ...)	Score a response using multiple scorers in parallel.
`score_text_async`(text, *[, objective])	Scores the given text based on the task using the chat target.
`validate_return_scores`(scores)	Validate the scores returned by the scorer.

Attributes

`evaluation_file_mapping`
`scorer_type`	Get the scorer type based on class hierarchy.

pyrit.score.PromptShieldScorer

Contents

pyrit.score.PromptShieldScorer#