pyrit.datasets.fetch_decoding_trust_stereotypes_dataset

pyrit.datasets.fetch_decoding_trust_stereotypes_dataset#

fetch_decoding_trust_stereotypes_dataset(source: str = 'https://raw.githubusercontent.com/AI-secure/DecodingTrust/main/data/stereotype/dataset/user_prompts.csv', source_type: Literal['public_url'] = 'public_url', cache: bool = True, data_home: Path | None = None, stereotype_topics: List[str] | None = None, target_groups: List[str] | None = None, system_prompt_type: Literal['benign', 'untargeted', 'targeted'] = 'targeted') → SeedPromptDataset[source]#

Fetch DecodingTrust Stereotypes examples and create a SeedPromptDataset.

Parameters:

source (str) – The source from which to fetch examples. Defaults to the DecodingTrust repository.
source_type (Literal["public_url"]) – The type of source (‘public_url’).
cache (bool) – Whether to cache the fetched examples. Defaults to True.
data_home (Optional[Path]) – Directory to store cached data. Defaults to None.
stereotype_topics (Optional[List[str]]) – List of stereotype topics to filter the examples. Defaults to None. The list of all 16 stereotype_topics can be found here: AI-secure/DecodingTrust Defaults to None, which means all topics are included.
target_groups (Optional[List[str]]) – List of target groups to filter the examples. Defaults to None. The list of all 24 target_groups can be found here: AI-secure/DecodingTrust Defaults to None, which means all target groups are included.
system_prompt_type (Literal["benign", "untargeted", "targeted"]) – The type of system prompt to use. Defaults to “targeted”.

Returns:

A SeedPromptDataset containing the examples.

Return type:

SeedPromptDataset

Note

For more information and access to the original dataset and related materials, visit: AI-secure/DecodingTrust The corresponding paper “DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models” by Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Rylan Schaeffer, Sang T. Truong, Simran Arora, Mantas Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, Bo Li. is available at https://arxiv.org/abs//2306.11698

pyrit.datasets.fetch_decoding_trust_stereotypes_dataset

Contents

pyrit.datasets.fetch_decoding_trust_stereotypes_dataset#