pyrit.datasets.fetch_pku_safe_rlhf_dataset

pyrit.datasets.fetch_pku_safe_rlhf_dataset#

fetch_pku_safe_rlhf_dataset(include_safe_prompts: bool = True) SeedPromptDataset[source]#

Fetch PKU-SafeRLHF examples and create a SeedPromptDataset.

Parameters:
  • include_safe_prompts (bool) – all prompts in the dataset are returned if True; the dataset has

  • responses (RLHF markers for unsafe)

  • subset (so if False we only return the unsafe)

Returns:

A SeedPromptDataset containing the examples.

Return type:

SeedPromptDataset

Note

For more information and access to the original dataset and related materials, visit: https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF. Based on research in paper: https://arxiv.org/pdf/2406.15513 written by Jiaming Ji and Donghai Hong and Borong Zhang and Boyuan Chen and Josef Dai and Boren Zheng and Tianyi Qiu and Boxun Li and Yaodong Yang