pyrit.datasets.fetch_sorry_bench_dataset

pyrit.datasets.fetch_sorry_bench_dataset#

fetch_sorry_bench_dataset(*, cache_dir: str | None = None, categories: List[str] | None = None, prompt_style: str | None = None, token: str | None = None) SeedDataset[source]#

Fetch Sorry-Bench dataset from Hugging Face (updated 2025/03 version).

The Sorry-Bench dataset contains adversarial prompts designed to test LLM safety across 44 categories with 21 different prompt styles (base + 20 linguistic mutations).

Reference: https://arxiv.org/abs/2406.14598

Parameters:
Returns:

SeedDataset containing Sorry-Bench prompts with harm categories.

Return type:

SeedDataset

Raises:

ValueError – If invalid categories or prompt_style are provided.