pyrit.models.SeedDataset#

class SeedDataset(*, seeds: Sequence[dict[str, Any]] | Sequence[Seed] | None = None, data_type: Literal['text', 'image_path', 'audio_path', 'video_path', 'binary_path', 'url', 'reasoning', 'error', 'function_call', 'tool_call', 'function_call_output'] | None = 'text', name: str | None = None, dataset_name: str | None = None, harm_categories: Sequence[str] | None = None, description: str | None = None, authors: Sequence[str] | None = None, groups: Sequence[str] | None = None, source: str | None = None, date_added: datetime | None = None, added_by: str | None = None, seed_type: Literal['prompt', 'objective', 'simulated_conversation'] | None = None, is_objective: bool = False)[source]#

Bases: YamlLoadable

SeedDataset manages seed prompts plus optional top-level defaults. Prompts are stored as a Sequence[Seed], so references to prompt properties are straightforward (e.g. ds.seeds[0].value).

__init__(*, seeds: Sequence[dict[str, Any]] | Sequence[Seed] | None = None, data_type: Literal['text', 'image_path', 'audio_path', 'video_path', 'binary_path', 'url', 'reasoning', 'error', 'function_call', 'tool_call', 'function_call_output'] | None = 'text', name: str | None = None, dataset_name: str | None = None, harm_categories: Sequence[str] | None = None, description: str | None = None, authors: Sequence[str] | None = None, groups: Sequence[str] | None = None, source: str | None = None, date_added: datetime | None = None, added_by: str | None = None, seed_type: Literal['prompt', 'objective', 'simulated_conversation'] | None = None, is_objective: bool = False)[source]#

Initialize the dataset. Typically, you’ll call from_dict or from_yaml_file so that top-level defaults are merged into each seed. If you’re passing seeds directly, they can be either a list of Seed objects or seed dictionaries (which then get converted to Seed objects).

Parameters:
  • seeds – List of seed dictionaries or Seed objects.

  • data_type – Default data type for seeds.

  • name – Name of the dataset.

  • dataset_name – Dataset name for categorization.

  • harm_categories – List of harm categories.

  • description – Description of the dataset.

  • authors – List of authors.

  • groups – List of groups.

  • source – Source of the dataset.

  • date_added – Date when the dataset was added.

  • added_by – User who added the dataset.

  • seed_type – The type of seeds in this dataset (“prompt”, “objective”, or “simulated_conversation”).

  • is_objective – Deprecated in 0.13.0. Use seed_type=”objective” instead.

Raises:

ValueError – If seeds are missing or contain invalid/contradictory seed definitions.

Methods

__init__(*[, seeds, data_type, name, ...])

Initialize the dataset.

from_dict(data)

Build a SeedDataset by merging top-level defaults into each item in seeds.

from_yaml_file(file)

Create a new object from a YAML file.

get_random_values(*, number[, harm_categories])

Extract and return random prompt values from the dataset.

get_values(*[, first, last, harm_categories])

Extract and return prompt values from the dataset.

group_seed_prompts_by_prompt_group_id(seeds)

Group the given list of seeds by prompt_group_id and create SeedGroup or SeedAttackGroup instances.

render_template_value(**kwargs)

Render seed values as templates using provided parameters.

Attributes

objectives

Return all objective-type seeds.

prompts

Return all prompt-type seeds.

seed_groups

Returns the seeds grouped by their prompt_group_id.

data_type

name

dataset_name

harm_categories

description

authors

groups

source

date_added

added_by

seeds

added_by: str | None#
authors: Sequence[str] | None#
data_type: str | None#
dataset_name: str | None#
date_added: datetime | None#
description: str | None#
classmethod from_dict(data: dict[str, Any]) SeedDataset[source]#

Build a SeedDataset by merging top-level defaults into each item in seeds.

Parameters:

data (Dict[str, Any]) – Dataset payload with top-level defaults and seed entries.

Returns:

Constructed dataset with merged defaults.

Return type:

SeedDataset

Raises:

ValueError – If any seed entry includes a pre-set prompt_group_id.

get_random_values(*, number: Annotated[int, Gt(gt=0)], harm_categories: Sequence[str] | None = None) Sequence[str][source]#

Extract and return random prompt values from the dataset.

Parameters:
  • number (int) – The number of random prompt values to return.

  • harm_categories (Optional[Sequence[str]]) – If provided, only prompts containing at least one of these harm categories are included.

Returns:

A list of prompt values.

Return type:

Sequence[str]

get_values(*, first: Annotated[int, Gt(gt=0)] | None = None, last: Annotated[int, Gt(gt=0)] | None = None, harm_categories: Sequence[str] | None = None) Sequence[str][source]#

Extract and return prompt values from the dataset.

Parameters:
  • first (Optional[int]) – If provided, values from the first N prompts are included.

  • last (Optional[int]) – If provided, values from the last N prompts are included.

  • harm_categories (Optional[Sequence[str]]) – If provided, only prompts containing at least one of these harm categories are included.

Returns:

A list of prompt values.

Return type:

Sequence[str]

static group_seed_prompts_by_prompt_group_id(seeds: Sequence[Seed]) Sequence[SeedGroup][source]#

Group the given list of seeds by prompt_group_id and create SeedGroup or SeedAttackGroup instances.

For each group, this method first attempts to create a SeedAttackGroup (which has attack-specific properties like objective). If validation fails, it falls back to a basic SeedGroup.

Parameters:

seeds – A list of Seed objects.

Returns:

A list of SeedGroup or SeedAttackGroup objects, with seeds grouped by prompt_group_id. Each group will be ordered by the sequence number of the seeds, if available.

groups: Sequence[str] | None#
harm_categories: Sequence[str] | None#
name: str | None#
property objectives: Sequence[SeedObjective]#

Return all objective-type seeds.

Returns:

Objective seeds in this dataset.

Return type:

Sequence[SeedObjective]

property prompts: Sequence[SeedPrompt]#

Return all prompt-type seeds.

Returns:

Prompt seeds in this dataset.

Return type:

Sequence[SeedPrompt]

render_template_value(**kwargs: object) None[source]#

Render seed values as templates using provided parameters.

Parameters:

kwargs – Key-value pairs to replace in the SeedDataset value.

Raises:

ValueError – If parameters are missing or invalid in the template.

property seed_groups: Sequence[SeedGroup]#

Returns the seeds grouped by their prompt_group_id.

Returns:

A list of SeedGroup objects, with seeds grouped by prompt_group_id.

Return type:

Sequence[SeedGroup]

seeds: Sequence[Seed]#
source: str | None#