pyrit.scenario.DatasetConfiguration#
- class DatasetConfiguration(*, seed_groups: List[SeedGroup] | None = None, dataset_names: List[str] | None = None, max_dataset_size: int | None = None, scenario_composites: Sequence[ScenarioCompositeStrategy] | None = None)[source]#
Bases:
objectConfiguration for scenario datasets.
This class provides a unified way to specify the dataset source for scenarios. Only ONE of seed_groups or dataset_names can be set.
- Parameters:
seed_groups (Optional[List[SeedGroup]]) – Explicit list of SeedGroup to use.
dataset_names (Optional[List[str]]) – Names of datasets to load from memory.
max_dataset_size (Optional[int]) – If set, randomly samples up to this many SeedGroups from the configured dataset source (without replacement, so no duplicates).
scenario_composites (Optional[Sequence[ScenarioCompositeStrategy]]) – The scenario strategies being executed. Subclasses can use this to filter or customize which seed groups are loaded based on the selected strategies.
- __init__(*, seed_groups: List[SeedGroup] | None = None, dataset_names: List[str] | None = None, max_dataset_size: int | None = None, scenario_composites: Sequence[ScenarioCompositeStrategy] | None = None) None[source]#
Initialize a DatasetConfiguration.
- Parameters:
seed_groups (Optional[List[SeedGroup]]) – Explicit list of SeedGroup to use.
dataset_names (Optional[List[str]]) – Names of datasets to load from memory.
max_dataset_size (Optional[int]) – If set, randomly samples up to this many SeedGroups (without replacement).
scenario_composites (Optional[Sequence[ScenarioCompositeStrategy]]) – The scenario strategies being executed. Subclasses can use this to filter or customize which seed groups are loaded.
- Raises:
ValueError – If both seed_groups and dataset_names are set.
ValueError – If max_dataset_size is less than 1.
Methods
__init__(*[, seed_groups, dataset_names, ...])Initialize a DatasetConfiguration.
Resolve and return all seed groups as SeedAttackGroups in a flat list.
Resolve and return all seed groups as a flat list.
Load all seed prompts from memory for all configured datasets.
Get the list of default dataset names for this configuration.
Resolve and return seed groups as SeedAttackGroups, grouped by dataset.
Resolve and return seed groups based on the configuration.
Check if this configuration has a data source configured.
- get_all_seed_attack_groups() List[SeedAttackGroup][source]#
Resolve and return all seed groups as SeedAttackGroups in a flat list.
This is a convenience method that calls get_seed_attack_groups() and flattens the results into a single list. Use this for attack scenarios that need SeedAttackGroup functionality.
- Returns:
All resolved seed attack groups from all datasets.
- Return type:
List[SeedAttackGroup]
- Raises:
ValueError – If no seed groups could be resolved from the configuration.
- get_all_seed_groups() List[SeedGroup][source]#
Resolve and return all seed groups as a flat list.
This is a convenience method that calls get_seed_groups() and flattens the results into a single list. Use this when you don’t need to track which dataset each seed group came from.
- Returns:
- All resolved seed groups from all datasets,
with max_dataset_size applied per dataset.
- Return type:
List[SeedGroup]
- Raises:
ValueError – If no seed groups could be resolved from the configuration.
- get_all_seeds() List[Seed][source]#
Load all seed prompts from memory for all configured datasets.
This is a convenience method that retrieves SeedPrompt objects directly from memory for all configured datasets. If max_dataset_size is set, randomly samples up to that many prompts per dataset (without replacement).
- Returns:
- List of SeedPrompt objects from all configured datasets.
Returns an empty list if no prompts are found.
- Return type:
List[SeedPrompt]
- Raises:
ValueError – If no dataset names are configured.
- get_default_dataset_names() List[str][source]#
Get the list of default dataset names for this configuration.
This is used by the CLI to display what datasets the scenario uses by default.
- Returns:
List of dataset names, or empty list if using explicit seed_groups.
- Return type:
List[str]
- get_seed_attack_groups() Dict[str, List[SeedAttackGroup]][source]#
Resolve and return seed groups as SeedAttackGroups, grouped by dataset.
This wraps get_seed_groups() and converts each SeedGroup to a SeedAttackGroup. Use this when you need attack-specific functionality like objectives, prepended conversations, or simulated conversation configuration.
- Returns:
- Dictionary mapping dataset names to their
seed attack groups.
- Return type:
Dict[str, List[SeedAttackGroup]]
- Raises:
ValueError – If no seed groups could be resolved from the configuration.
- get_seed_groups() Dict[str, List[SeedGroup]][source]#
Resolve and return seed groups based on the configuration.
This method handles all resolution logic: 1. If seed_groups is set, use those directly (under key ‘_explicit_seed_groups’) 2. If dataset_names is set, load from memory using those names
In all cases, max_dataset_size is applied per dataset if set.
Subclasses can override this to filter or customize which seed groups are loaded based on the stored scenario_composites.
- Returns:
- Dictionary mapping dataset names to their
seed groups. When explicit seed_groups are provided, the key is ‘_explicit_seed_groups’. Each dataset’s seed groups are potentially sampled down to max_dataset_size.
- Return type:
- Raises:
ValueError – If no seed groups could be resolved from the configuration.