pyrit.scenario.DatasetConfiguration#
- class DatasetConfiguration(*, seed_groups: List[SeedGroup] | None = None, dataset_names: List[str] | None = None, max_dataset_size: int | None = None, scenario_composites: Sequence[ScenarioCompositeStrategy] | None = None)[source]#
Bases:
objectConfiguration for scenario datasets.
This class provides a unified way to specify the dataset source for scenarios. Only ONE of seed_groups or dataset_names can be set.
- Parameters:
seed_groups (Optional[List[SeedGroup]]) – Explicit list of SeedGroups to use.
dataset_names (Optional[List[str]]) – Names of datasets to load from memory.
max_dataset_size (Optional[int]) – If set, randomly samples up to this many SeedGroups from the configured dataset source (without replacement, so no duplicates).
scenario_composites (Optional[Sequence[ScenarioCompositeStrategy]]) – The scenario strategies being executed. Subclasses can use this to filter or customize which seed groups are loaded based on the selected strategies.
- __init__(*, seed_groups: List[SeedGroup] | None = None, dataset_names: List[str] | None = None, max_dataset_size: int | None = None, scenario_composites: Sequence[ScenarioCompositeStrategy] | None = None) None[source]#
Initialize a DatasetConfiguration.
- Parameters:
seed_groups (Optional[List[SeedGroup]]) – Explicit list of SeedGroups to use.
dataset_names (Optional[List[str]]) – Names of datasets to load from memory.
max_dataset_size (Optional[int]) – If set, randomly samples up to this many SeedGroups (without replacement).
scenario_composites (Optional[Sequence[ScenarioCompositeStrategy]]) – The scenario strategies being executed. Subclasses can use this to filter or customize which seed groups are loaded.
- Raises:
ValueError – If both seed_groups and dataset_names are set.
ValueError – If max_dataset_size is less than 1.
Methods
__init__(*[, seed_groups, dataset_names, ...])Initialize a DatasetConfiguration.
Resolve and return all seed groups as a flat list.
Get the list of default dataset names for this configuration.
Resolve and return seed groups based on the configuration.
Check if this configuration has a data source configured.
- get_all_seed_groups() List[SeedGroup][source]#
Resolve and return all seed groups as a flat list.
This is a convenience method that calls get_seed_groups() and flattens the results into a single list. Use this when you don’t need to track which dataset each seed group came from.
- Returns:
- All resolved seed groups from all datasets,
with max_dataset_size applied per dataset.
- Return type:
List[SeedGroup]
- Raises:
ValueError – If no seed groups could be resolved from the configuration.
- get_default_dataset_names() List[str][source]#
Get the list of default dataset names for this configuration.
This is used by the CLI to display what datasets the scenario uses by default.
- Returns:
List of dataset names, or empty list if using explicit seed_groups.
- Return type:
List[str]
- get_seed_groups() Dict[str, List[SeedGroup]][source]#
Resolve and return seed groups based on the configuration.
This method handles all resolution logic: 1. If seed_groups is set, use those directly (under key ‘_explicit_seed_groups’) 2. If dataset_names is set, load from memory using those names
In all cases, max_dataset_size is applied per dataset if set.
Subclasses can override this to filter or customize which seed groups are loaded based on the stored scenario_composites.
- Returns:
- Dictionary mapping dataset names to their
seed groups. When explicit seed_groups are provided, the key is ‘_explicit_seed_groups’. Each dataset’s seed groups are potentially sampled down to max_dataset_size.
- Return type:
- Raises:
ValueError – If no seed groups could be resolved from the configuration.