API Reference#
pyrit.analytics#
Analytics module for PyRIT conversation and result analysis.
Analyze a list of AttackResult objects and return overall and grouped statistics. |
|
Statistics for attack analysis results. |
|
Handles analytics operations on conversation data, such as finding similar chat messages based on conversation history or embedding similarity. |
pyrit.auth#
Authentication functionality for a variety of services.
Abstract base class for authenticators. |
|
Azure CLI Authentication. |
|
A utility class for Azure Storage authentication, providing methods to generate SAS tokens using user delegation keys. |
pyrit.auxiliary_attacks#
pyrit.chat_message_normalizer#
Functionality to normalize chat messages into compatible formats for targets.
Abstract base class for normalizing chat messages for model or target compatibility. |
|
A no-op chat message normalizer that does not modify the input messages. |
|
Normalizer that combines the first system message with the first user message using generic instruction tags. |
|
A chat message normalizer that converts a list of chat messages to a ChatML string. |
|
Enable application of the chat template stored in a Hugging Face tokenizer to a list of chat messages. |
pyrit.cli#
Module to provide command-line interface functionalities for PyRIT. The CLI module is currently experimental.
pyrit.common#
Common utilities and helpers for PyRIT.
Apply default values to a class constructor. |
|
Apply default values to a method's parameters. |
|
Combine two dictionaries containing string keys and values into one. |
|
Combine two lists or strings into a single list with unique values. |
|
Convert a local image file to a data URL encoded in base64. |
|
Represents a scope for default values with class type, parameter name, and inheritance rules. |
|
Generate a deprecation message string. |
|
Display response images if running in notebook environment. |
|
Download a chunk of the file with a specified byte range. |
|
Download a file in multiple segments (splits) using byte-range requests. |
|
Download multiple files with parallel downloads and segmented downloading. |
|
Download specific files from a Hugging Face model repository. |
|
Fetch available files for a model from the Hugging Face repository. |
|
Get the global default values registry. |
|
Get the httpx client for making requests. |
|
Validate and extract a parameter from kwargs. |
|
Get a non-required value from an environment variable or a passed value, preferring the passed value. |
|
Generate a list of random indices based on the specified proportion of a given size. |
|
Get a required value from an environment variable or a passed value, preferring the passed value. |
|
Determine if the code is running in an IPython session. |
|
Make a request and raise an exception if it fails. |
|
Print chat messages with color to console. |
|
Reset all default values in the global registry. |
|
Set a default value for a specific class and parameter. |
|
A metaclass for creating singleton classes. |
|
Warn about unused parameters in configurations. |
|
Abstract base class for objects that can be loaded from YAML files. |
pyrit.datasets#
Dataset fetching and loading utilities for various red teaming and safety evaluation datasets.
Abstract base class for providing seed datasets with automatic registration. |
|
A class that manages jailbreak datasets (like DAN, etc.). |
pyrit.embedding#
Embedding module for PyRIT to provide OpenAI text embedding class.
Text embedding class that works with both Azure OpenAI and platform OpenAI endpoints. |
pyrit.exceptions#
Exception class for bad client requests. |
|
Exception class for empty response errors. |
|
Exception class for blocked content errors. |
|
Exception class for missing prompt placeholder errors. |
|
A decorator to apply retry logic with exponential backoff to a function. |
|
A decorator to apply retry logic to a function. |
|
A decorator to apply retry logic with exponential backoff to a function. |
|
A decorator to apply retry logic. |
|
Exception class for authentication errors. |
|
Checks if the response message is in JSON format and removes Markdown formatting if present. |
pyrit.executor.attack#
Attack executor module.
Adversarial configuration for attacks that involve adversarial chat targets. |
|
Base class for all attack contexts. |
|
Configuration for prompt converters used in attacks. |
|
Manages the execution of attack strategies with support for parallel execution. |
|
Result container for attack execution, supporting both full and partial completion. |
|
Immutable parameters for attack execution. |
|
Abstract base class for printing attack results. |
|
Scoring configuration for evaluating attack effectiveness. |
|
Abstract base class for attack strategies. |
|
Console printer for attack results with enhanced formatting. |
|
Implementation of the context compliance attack strategy. |
|
Manages conversations for attacks, handling message history, system prompts, and conversation state. |
|
Session for conversations. |
|
Container for conversation state data shared between attack components. |
|
Implementation of the Crescendo attack strategy. |
|
Context for the Crescendo attack strategy. |
|
Result of the Crescendo attack strategy execution. |
|
Implement the FlipAttack method found here: https://arxiv.org/html/2410.02832v1. |
|
Implement the Many Shot Jailbreak method as discussed in research found here: https://www.anthropic.com/research/many-shot-jailbreaking. |
|
Markdown printer for attack results optimized for Jupyter notebooks. |
|
Implementation of multi-prompt sending attack strategy. |
|
Parameters for MultiPromptSendingAttack. |
|
Context for multi-turn attacks. |
|
Strategy for executing multi-turn attacks. |
|
Evaluates scores from a Scorer to determine objective achievement and provide feedback. |
|
Implementation of single-turn prompt sending attack strategy. |
|
Enum for predefined red teaming attack system prompt paths. |
|
Implementation of multi-turn red teaming attack strategy. |
|
Implementation of single-turn role-play attack strategy. |
|
Enum for predefined role-play scenario paths. |
|
Context for single-turn attacks. |
|
Strategy for executing single-turn attacks. |
|
Implementation of the skeleton key jailbreak attack strategy. |
|
alias of |
|
Context for the Tree of Attacks with Pruning (TAP) attack strategy. |
|
Result of the Tree of Attacks with Pruning (TAP) attack strategy execution. |
|
Implement the Tree of Attacks with Pruning (TAP) attack strategy. |
pyrit.executor.promptgen#
Prompt generator strategy imports.
Context specific to Anecdoctor prompt generation. |
|
Implementation of the Anecdoctor prompt generation strategy. |
|
Result of Anecdoctor prompt generation. |
|
Base class for all prompt generator strategies. |
|
Base class for all prompt generator strategy contexts. |
|
Base class for all prompt generator strategy results. |
pyrit.executor.promptgen.fuzzer#
Fuzzer module for generating adversarial prompts through mutation and crossover operations.
Base class for GPTFUZZER converters. |
|
Context for the Fuzzer prompt generation strategy. |
|
Uses multiple prompt templates to generate new prompts. |
|
Generates versions of a prompt with new, prepended sentences. |
|
Implementation of the Fuzzer prompt generation strategy using Monte Carlo Tree Search (MCTS). |
|
Generates versions of a prompt with rephrased sentences. |
|
Result of the Fuzzer prompt generation strategy execution. |
|
Printer for Fuzzer generation strategy results with enhanced console formatting. |
|
Generates versions of a prompt with shortened sentences. |
|
Generates versions of a prompt with similar sentences. |
pyrit.executor.workflow#
Workflow components and strategies used by the PyRIT executor.
Context for Cross-Domain Prompt Injection Attack (XPIA) workflow. |
|
Result of XPIA workflow execution. |
|
Implementation of Cross-Domain Prompt Injection Attack (XPIA) workflow. |
|
XPIA workflow with automated test processing. |
|
XPIA workflow with manual processing intervention. |
|
Protocol for processing callback functions used in XPIA workflows. |
|
Enumeration of possible XPIA attack result statuses. |
pyrit.memory#
Provide functionality for storing and retrieving conversation history and embeddings.
This package defines the core MemoryInterface and concrete implementations for different storage backends.
Represents the attack result data in the database. |
|
A class to manage conversation memory using Azure SQL Server as the backend database. |
|
Provide a centralized memory instance across the framework. |
|
Represents the embedding data associated with conversation entries in the database. |
|
Abstract interface for conversation memory storage systems. |
|
The MemoryEmbedding class is responsible for encoding the memory embeddings. |
|
Handles the export of data to various formats, currently supporting only JSON format. |
|
Represents the prompt data. |
|
Represents the raw prompt or prompt template data as found in open datasets. |
|
A memory interface that uses SQLite as the backend database. |
pyrit.models#
Built-in mutable sequence. |
|
Implementation of StorageIO for Azure Blob Storage. |
|
Represents a dataset of chat messages. |
|
alias of |
|
Constructs a response entry from a request. |
|
Immutable reference to a conversation that played a role in the attack. |
|
Types of conversations that can be associated with an attack. |
|
Abstract base class for data type normalizers. |
|
Factory method to create a DataTypeSerializer instance. |
|
Implementation of StorageIO for local disk storage. |
|
Groups message pieces from the same conversation into Messages. |
|
Groups message pieces from multiple conversations into separate conversation groups. |
|
alias of |
|
Enum representing the possible outcomes of an attack. |
|
Base class for all attack results. |
|
Represents a message in a conversation, for example a prompt or a response to a prompt. |
|
Represents a piece of a message to a target. |
|
alias of |
|
alias of |
|
Represents a dataset for question answering. |
|
Represents a question model. |
|
Represents a choice for a question. |
|
Scenario result class for aggregating results from multiple AtomicAttacks. |
|
Scenario result class for aggregating scenario results. |
|
alias of |
|
Represents seed data with various attributes and metadata. |
|
SeedDataset manages seed prompts plus optional top-level defaults. |
|
A group of prompts that need to be sent together, along with an objective. |
|
Represents a seed objective with various attributes and metadata. |
|
Represents a seed prompt with various attributes and metadata. |
|
Group by conversation_id. |
|
Abstract interface for storage systems (local disk, Azure Storage Account, etc.). |
|
Base class for all strategy results. |
|
Score is an object that validates all the fields. |
pyrit.prompt_converter#
Adds a string to an image and wraps the text into multiple lines if necessary. |
|
Adds an image to a video at a specified position. |
|
Adds a string to an image and wraps the text into multiple lines if necessary. |
|
Generates prompts with ANSI codes to evaluate LLM behavior and system risks. |
|
Uses the art package to convert text into ASCII art. |
|
Implements encoding and decoding using Unicode Tags. |
|
Wraps encoded text with prompts that ask a target to decode it. |
|
Encodes text using the Atbash cipher. |
|
Shifts the frequency of an audio file by a specified value. |
|
Transcribes a .wav audio file into text using Azure AI Speech service. |
|
Generates a wave file from a text prompt using Azure AI Speech service. |
|
Converter that encodes text to base2048 format. |
|
Converter that encodes text to base64 format. |
|
Converts text to various binary-to-ASCII encodings. |
|
Transforms input text into its binary representation with configurable bits per character (8, 16, or 32). |
|
Converts text into Braille Unicode representation. |
|
Encodes text using the Caesar cipher with a specified offset. |
|
Spaces out the input prompt and removes specified punctuations. |
|
Applies character swapping to words in the prompt to test adversarial textual robustness. |
|
Encrypts user prompt, adds stringified decrypt function in markdown and instructions. |
|
Converts text into colloquial Singaporean context. |
|
The result of a prompt conversion, containing the converted output and its type. |
|
Replaces forbidden words or phrases in a prompt with synonyms using an LLM. |
|
Applies diacritics to specified characters in a string. |
|
Converter that encodes text using Ecoji encoding. |
|
Converts English text to randomly chosen circle or square character emojis. |
|
Replaces each word of the prompt with its first letter (or digit). |
|
Flips the input text prompt. |
|
Allows review of each prompt sent to a target before sending it. |
|
Compresses images to reduce file size while preserving visual quality. |
|
Selects text based on absolute character indices. |
|
Inserts punctuation into a prompt to test robustness. |
|
Selects text around a keyword with optional context. |
|
Converts a string to a leetspeak version. |
|
Represents a generic LLM converter that expects text to be transformed (e.g. no JSON parsing or format). |
|
Generates malicious questions using an LLM. |
|
Convert text into character-level algebraic identities. |
|
Converts natural language instructions into symbolic mathematics problems using an LLM. |
|
Encodes prompts using morse code. |
|
Converts text into NATO phonetic alphabet representation. |
|
Injects noise errors into a conversation using an LLM. |
|
Converts a text prompt into a PDF file. |
|
Rephrases prompts using a variety of persuasion techniques. |
|
Selects text based on proportional start and end positions. |
|
Base class for converters that transform prompts into a different representation or format. |
|
Selects a proportion of text anchored to a specific position (start, end, middle, or random). |
|
Converts a text string to a QR code image. |
|
Takes a prompt and randomly capitalizes it by a percentage of the total characters. |
|
Translates each individual word in a prompt to a random language using an LLM. |
|
Selects text based on proportional start and end positions. |
|
Selects text based on the first regex match. |
|
Repeats a specified token a specified number of times in addition to a given prompt. |
|
Encodes prompts using the ROT13 cipher. |
|
Converts a string by replacing chosen phrase with a new phrase of choice. |
|
A wrapper converter that applies another converter to selected portions of text. |
|
Encodes and decodes text using a bit-level approach. |
|
Converts text by joining its characters with the specified join value. |
|
Appends a specified suffix to the prompt. |
|
Converts text to superscript. |
|
Uses a template to randomly split a prompt into segments defined by the template. |
|
Converts a conversation to a different tense using an LLM. |
|
Uses a jailbreak template to create a prompt. |
|
Base class for text selection strategies used by SelectiveTextConverter and WordLevelConverter. |
|
A special selection strategy that signals SelectiveTextConverter to auto-detect and convert text between start/end tokens (e.g., ⟪ and ⟫). |
|
Converts a conversation to a different tone using an LLM. |
|
Generates toxic sentence starters using an LLM. |
|
Translates prompts into different languages using an LLM. |
|
Creates a transparency attack by optimizing an alpha channel to blend attack and benign images. |
|
Applies substitutions to words in the prompt to test adversarial textual robustness by replacing characters with visually similar ones. |
|
Converts a prompt to its unicode representation. |
|
Encodes the prompt using any unicode starting point. |
|
Converts a prompt to a URL-encoded string. |
|
Generates variations of the input prompts using the converter target. |
|
Encodes and decodes text using Unicode Variation Selectors. |
|
Selects words based on their indices in the word list. |
|
Selects words that match specific keywords. |
|
Selects words based on proportional start and end positions. |
|
Selects a random proportion of words. |
|
Selects words that match a regex pattern. |
|
Base class for word-level selection strategies. |
|
Converts text into cursed Zalgo text using combining Unicode marks. |
|
Injects zero-width spaces between characters in the provided text to bypass content safety mechanisms. |
pyrit.prompt_normalizer#
Prompt normalization components for standardizing and converting prompts.
This module provides tools for normalizing prompts before sending them to targets, including converter configurations and request handling.
Handles normalization and processing of prompts before they are sent to targets. |
|
Represents the configuration for a prompt response converter. |
|
Represents a single request sent to normalizer. |
pyrit.prompt_target#
Prompt targets for PyRIT.
Target implementations for interacting with different services and APIs, for example sending prompts or transferring content (uploads).
The AzureBlobStorageTarget takes prompts, saves the prompts to a file, and stores them as a blob in a provided storage account container. |
|
A prompt target for Azure Machine Learning chat endpoints. |
|
Enumeration of Copilot interface types. |
|
A prompt target for the Crucible service. |
|
Enumeration of Gandalf challenge levels. |
|
A prompt target for the Gandalf security challenge. |
|
Determine proper parsing response function for an HTTP Request. |
|
Get a callback function that parses HTTP responses using regex matching. |
|
HTTP_Target is for endpoints that do not have an API and instead require HTTP request(s) to send a prompt. |
|
A subclass of HTTPTarget that only does "API mode" (no raw HTTP request). |
|
The HuggingFaceChatTarget interacts with HuggingFace models, specifically for conducting red teaming activities. |
|
The HuggingFaceEndpointTarget interacts with HuggingFace models hosted on cloud endpoints. |
|
Enforce rate limit of the target through setting requests per minute. |
|
A prompt target for OpenAI completion endpoints. |
|
A target for image generation using OpenAI's image models. |
|
Facilitates multimodal (image and text) input and text output generation. |
|
Enables communication with endpoints that support the OpenAI Response API. |
|
OpenAI Video Target using the OpenAI SDK for video generation. |
|
A prompt target for OpenAI Text-to-Speech (TTS) endpoints. |
|
Abstract base class for OpenAI-based prompt targets. |
|
PlaywrightCopilotTarget uses Playwright to interact with Microsoft Copilot web UI. |
|
PlaywrightTarget uses Playwright to interact with a web UI. |
|
A prompt chat target is a target where you can explicitly set the conversation history using memory. |
|
PromptShield is an endpoint which detects the presence of a jailbreak. |
|
Abstract base class for prompt targets. |
|
A prompt target for Azure OpenAI Realtime API. |
|
The TextTarget takes prompts, adds them to memory and writes them to io which is sys.stdout by default. |
pyrit.score#
Scoring functionality for evaluating AI model responses across various dimensions including harm detection, objective completion, and content classification.
A scorer that uses Azure Content Safety API to evaluate text and images for harmful content. |
|
A utility class for scoring prompts in batches in a parallelizable and convenient way. |
|
Paths to content classifier YAML files. |
|
Scorer that evaluates entire conversation history rather than individual messages. |
|
Create a ConversationScorer that inherits from the same type as the wrapped scorer. |
|
Scorer that checks if the request values are in the output using a text matching strategy. |
|
Namespace for float scale score aggregators that return a single aggregated score. |
|
Base class for scorers that return floating-point scores in the range [0, 1]. |
|
Namespace for float scale score aggregators that combine all categories. |
|
Namespace for float scale score aggregators that group by category. |
|
A scorer that applies a threshold to a float scale score to make it a true/false score. |
|
A scorer for evaluating responses in Gandalf challenges. |
|
A class that represents a human-labeled dataset entry for a specific harm category. |
|
A class that evaluates a harm scorer against HumanLabeledDatasets of type HARM. |
|
Metrics for evaluating a harm scorer against a HumanLabeledDataset. |
|
Create scores from manual human input using Gradio and adds them to the database. |
|
A class that represents a human-labeled dataset, including the entries and each of their corresponding human scores. |
|
A class that represents an entry in a dataset of assistant responses that have been scored by humans. |
|
A scorer that uses an LLM to evaluate code snippets for potential security vulnerabilities. |
|
Enum containing paths to Likert scale YAML configuration files. |
|
A scorer that detects markdown injection attempts in text responses. |
|
Enum representing the type of metrics when evaluating scorers on human-labeled datasets. |
|
A class that represents a human-labeled dataset entry for a specific objective. |
|
A class that evaluates an objective scorer against HumanLabeledDatasets of type OBJECTIVE. |
|
Metrics for evaluating an objective scorer against a HumanLabeledDataset. |
|
Enum representing different plagiarism detection metrics. |
|
A scorer that measures plagiarism by computing word-level similarity between the AI response and a reference text. |
|
Returns true if an attack or jailbreak has been detected by Prompt Shield. |
|
A class that represents a question answering scorer. |
|
Abstract base class for scorers. |
|
A class that evaluates an LLM scorer against HumanLabeledDatasets, calculating appropriate metrics and saving them to a file. |
|
Configuration class for Scorers. |
|
Base dataclass for storing scorer evaluation metrics. |
|
Validates message pieces and scorer configurations. |
|
A class that represents a self-ask score for text classification and scoring. |
|
A general-purpose self-ask float-scale scorer that uses a chat target and a configurable system prompt and prompt format. |
|
A general-purpose self-ask True/False scorer that uses a chat target and a configurable system prompt and prompt format. |
|
A class that represents a "self-ask" score for text scoring for a likert scale. |
|
A class that represents a self-ask question answering scorer. |
|
A self-ask scorer that detects refusal in AI responses. |
|
A class that represents a "self-ask" score for text scoring for a customizable numeric scale. |
|
A class that represents a self-ask true/false for scoring. |
|
Scorer that checks if a given substring is present in the text. |
|
Composite true/false scorer that aggregates results from other true/false scorers. |
|
A scorer that inverts a true false score. |
|
A class that represents a true/false question. |
|
Paths to true/false question YAML files. |
|
Namespace for true/false score aggregators that return a single aggregated score. |
|
Base class for scorers that return true/false binary scores. |
|
A scorer that processes videos by extracting frames and scoring them using a float scale image scorer. |
|
A scorer that processes videos by extracting frames and scoring them using a true/false image scorer. |
pyrit.scenario#
High-level scenario classes for running attack configurations.
Represents a single atomic attack test combining an attack strategy and dataset. |
|
Encoding Scenario implementation for PyRIT. |
|
Strategies for attacks with tag-based categorization. |
|
FoundryScenario is a preconfigured scenario that automatically generates multiple AtomicAttack instances based on the specified attack strategies. |
|
Groups and executes multiple AtomicAttack instances sequentially. |
|
Base class for attack strategies with tag-based categorization and aggregation. |
pyrit.setup#
Module containing initialization PyRIT.
Initialize PyRIT with the provided memory instance and loads environment files. |
|
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str |
|
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str |
|
str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str |
pyrit.setup.initializers#
PyRIT initializers package.
Abstract base class for PyRIT configuration initializers. |
|
AIRT (AI Red Team) configuration initializer. |
|
Complete simple configuration initializer. |
|
Load default datasets for all registered scenarios. |
|
Configure a default list of objectives for use in PyRIT scenarios. |