pyrit.scenario.airt.ContentHarmsStrategy

pyrit.scenario.airt.ContentHarmsStrategy#

class ContentHarmsStrategy(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: ScenarioStrategy

ContentHarmsStrategy defines a set of strategies for testing model behavior across several different harm categories. The scenario is designed to provide quick feedback on model performance with respect to common harm types with the idea being that users will dive deeper into specific harm categories based on initial results.

Each tag represents a different harm category that the model can be tested for. Specifying the all tag will include a comprehensive test suite covering all harm categories. Users can defined objectives for each harm category via seed datasets or use the default datasets provided with PyRIT. For each harm category, the scenario will run a RolePlayAttack, ManyShotJailbreakAttack, PromptSendingAttack, and RedTeamingAttack for each objective in the dataset. to evaluate model behavior.

__init__(*args, **kwds)#

Methods

`get_aggregate_tags`()	Get the set of tags that represent aggregate categories.
`get_strategies_by_tag`(tag)	Get all attack strategies that have a specific tag.
`get_all_strategies`()	Get all non-aggregate strategies for this strategy enum.
`get_aggregate_strategies`()	Get all aggregate strategies for this strategy enum.
`normalize_strategies`(strategies)	Normalize a set of attack strategies by expanding aggregate tags.
`prepare_scenario_strategies`([strategies, ...])	Prepare and normalize scenario strategies for use in a scenario.
`supports_composition`()	Indicate whether this strategy type supports composition.
`validate_composition`(strategies)	Validate whether the given strategies can be composed together.

Attributes

`tags`	Get the tags for this attack strategy.
`ALL`
`Hate`
`Fairness`
`Violence`
`Sexual`
`Harassment`
`Misinformation`
`Leakage`

ALL = 'all'#

Fairness = 'fairness'#

Harassment = 'harassment'#

Hate = 'hate'#

Leakage = 'leakage'#

Misinformation = 'misinformation'#

Sexual = 'sexual'#

Violence = 'violence'#

pyrit.scenario.airt.ContentHarmsStrategy

Contents

pyrit.scenario.airt.ContentHarmsStrategy#