pyrit.scenario.DatasetConfiguration#

class DatasetConfiguration(*, seed_groups: List[SeedGroup] | None = None, dataset_names: List[str] | None = None, max_dataset_size: int | None = None, scenario_composites: Sequence[ScenarioCompositeStrategy] | None = None)[source]#

Bases: object

Configuration for scenario datasets.

This class provides a unified way to specify the dataset source for scenarios. Only ONE of seed_groups or dataset_names can be set.

Parameters:
  • seed_groups (Optional[List[SeedGroup]]) – Explicit list of SeedGroups to use.

  • dataset_names (Optional[List[str]]) – Names of datasets to load from memory.

  • max_dataset_size (Optional[int]) – If set, randomly samples up to this many SeedGroups from the configured dataset source (without replacement, so no duplicates).

  • scenario_composites (Optional[Sequence[ScenarioCompositeStrategy]]) – The scenario strategies being executed. Subclasses can use this to filter or customize which seed groups are loaded based on the selected strategies.

__init__(*, seed_groups: List[SeedGroup] | None = None, dataset_names: List[str] | None = None, max_dataset_size: int | None = None, scenario_composites: Sequence[ScenarioCompositeStrategy] | None = None) None[source]#

Initialize a DatasetConfiguration.

Parameters:
  • seed_groups (Optional[List[SeedGroup]]) – Explicit list of SeedGroups to use.

  • dataset_names (Optional[List[str]]) – Names of datasets to load from memory.

  • max_dataset_size (Optional[int]) – If set, randomly samples up to this many SeedGroups (without replacement).

  • scenario_composites (Optional[Sequence[ScenarioCompositeStrategy]]) – The scenario strategies being executed. Subclasses can use this to filter or customize which seed groups are loaded.

Raises:
  • ValueError – If both seed_groups and dataset_names are set.

  • ValueError – If max_dataset_size is less than 1.

Methods

__init__(*[, seed_groups, dataset_names, ...])

Initialize a DatasetConfiguration.

get_all_seed_groups()

Resolve and return all seed groups as a flat list.

get_default_dataset_names()

Get the list of default dataset names for this configuration.

get_seed_groups()

Resolve and return seed groups based on the configuration.

has_data_source()

Check if this configuration has a data source configured.

get_all_seed_groups() List[SeedGroup][source]#

Resolve and return all seed groups as a flat list.

This is a convenience method that calls get_seed_groups() and flattens the results into a single list. Use this when you don’t need to track which dataset each seed group came from.

Returns:

All resolved seed groups from all datasets,

with max_dataset_size applied per dataset.

Return type:

List[SeedGroup]

Raises:

ValueError – If no seed groups could be resolved from the configuration.

get_default_dataset_names() List[str][source]#

Get the list of default dataset names for this configuration.

This is used by the CLI to display what datasets the scenario uses by default.

Returns:

List of dataset names, or empty list if using explicit seed_groups.

Return type:

List[str]

get_seed_groups() Dict[str, List[SeedGroup]][source]#

Resolve and return seed groups based on the configuration.

This method handles all resolution logic: 1. If seed_groups is set, use those directly (under key ‘_explicit_seed_groups’) 2. If dataset_names is set, load from memory using those names

In all cases, max_dataset_size is applied per dataset if set.

Subclasses can override this to filter or customize which seed groups are loaded based on the stored scenario_composites.

Returns:

Dictionary mapping dataset names to their

seed groups. When explicit seed_groups are provided, the key is ‘_explicit_seed_groups’. Each dataset’s seed groups are potentially sampled down to max_dataset_size.

Return type:

Dict[str, List[SeedGroup]]

Raises:

ValueError – If no seed groups could be resolved from the configuration.

has_data_source() bool[source]#

Check if this configuration has a data source configured.

Returns:

True if seed_groups or dataset_names is configured.

Return type:

bool