pyrit.datasets.fetch_sorry_bench_dataset

pyrit.datasets.fetch_sorry_bench_dataset#

fetch_sorry_bench_dataset(*, cache_dir: str | None = None, categories: List[str] | None = None, prompt_style: str | None = None, token: str | None = None) → SeedDataset[source]#

Fetch Sorry-Bench dataset from Hugging Face (updated 2025/03 version).

The Sorry-Bench dataset contains adversarial prompts designed to test LLM safety across 44 categories with 21 different prompt styles (base + 20 linguistic mutations).

Reference: https://arxiv.org/abs/2406.14598

Parameters:

cache_dir (Optional[str]) – Optional cache directory for Hugging Face datasets.
categories (Optional[List[str]]) – Optional list of categories to filter. Full list in: https://huggingface.co/datasets/sorry-bench/sorry-bench-202503/blob/main/meta_info.py
prompt_style (Optional[str]) – Optional prompt style to filter. Available styles: “base”, “ascii”, “caesar”, “slang”, “authority_endorsement”, etc. Default: “base” (only base prompts, no mutations) Full list: https://huggingface.co/datasets/sorry-bench/sorry-bench-202503
token (Optional[str]) – Hugging Face authentication token. If not provided, will attempt to read from HUGGINGFACE_TOKEN environment variable. This is needed for accessing gated datasets on Hugging Face.

Returns:

SeedDataset containing Sorry-Bench prompts with harm categories.

Return type:

SeedDataset

Raises:

ValueError – If invalid categories or prompt_style are provided.

pyrit.datasets.fetch_sorry_bench_dataset

Contents

pyrit.datasets.fetch_sorry_bench_dataset#