pyrit.prompt_converter.TransparencyAttackConverter

pyrit.prompt_converter.TransparencyAttackConverter#

class TransparencyAttackConverter(*, benign_image_path: Path, size: Tuple[int, int] = (150, 150), steps: int = 1500, learning_rate: float = 0.001, convergence_threshold: float = 1e-06, convergence_patience: int = 10)[source]#

Bases: PromptConverter

Creates a transparency attack by optimizing an alpha channel to blend attack and benign images.

This converter takes two inputs:

Benign image (foreground/target): The harmless image specified during initialization.
Attack image (background/harmful): The potentially harmful image passed via the prompt parameter.

The algorithm optimizes a transparency pattern so that the output PNG exhibits dual perception:

On white/light backgrounds: appears as the benign image.
On dark backgrounds: reveals the attack image content.
AI systems may perceive either image depending on their background processing assumptions.

Currently, only JPEG images are supported as input. Output images will always be saved as PNG with transparency.

Note

This converter implements the transparency attack as described in: “Transparency Attacks: How Imperceptible Image Layers Can Fool AI Perception” by McKee, F. and Noever, D., 2024: https://arxiv.org/abs/2401.15817

As stated in the paper: “The major limitation of the transparency attack is the low success rate when the human viewer’s background theme is not light by default or at least a close match to the transparent foreground and hidden background layers. When mismatched, the background becomes visible to the human eye and the vision algorithm.”

__init__(*, benign_image_path: Path, size: Tuple[int, int] = (150, 150), steps: int = 1500, learning_rate: float = 0.001, convergence_threshold: float = 1e-06, convergence_patience: int = 10)[source]#

Initializes the converter with the path to a benign image and parameters for blending.

Parameters:

benign_image_path (Path) – Path to the benign image file. Must be a JPEG file (.jpg or .jpeg).
size (tuple) – Size that the images will be resized to (width, height). It is recommended to use a size that matches aspect ratio of both attack and benign images. Since the original study resizes images to 150x150 pixels, this is the default size used. Bigger values may significantly increase computation time.
steps (int) – Number of optimization steps to perform. Recommended range: 100-2000 steps. Default is 1500. Generally, the higher the steps, the better end result you can achieve, but at the cost of increased computation time.
learning_rate (float) – Controls the magnitude of adjustments in each step (used by the Adam optimizer). Recommended range: 0.0001-0.01. Default is 0.001. Values close to 1 may lead to instability and lower quality blending, while values too low may require more steps to achieve a good blend.
convergence_threshold (float) – Minimum change in loss required to consider improvement. If the change in loss between steps is below this value, it’s counted as no improvement. Default is 1e-6. Recommended range: 1e-6 to 1e-4.
convergence_patience (int) – Number of consecutive steps with no improvement before stopping. Default is 10.

Raises:

ValueError – If the benign image is invalid or is not in JPEG format.
ValueError – If the learning rate is outside the valid range (0, 1).
ValueError – If the size is not a tuple of two positive integers (width, height).
ValueError – If the steps is not a positive integer.
ValueError – If convergence threshold is not a float between 0 and 1.
ValueError – If convergence patience is not a positive integer.

Methods

`__init__`(*, benign_image_path[, size, ...])	Initializes the converter with the path to a benign image and parameters for blending.
`convert_async`(*, prompt[, input_type])	Converts the given prompt by blending an attack image (potentially harmful) with a benign image.
`convert_tokens_async`(*, prompt[, ...])	Converts substrings within a prompt that are enclosed by specified start and end tokens.
`get_identifier`()	Returns an identifier dictionary for the converter.
`input_supported`(input_type)	Checks if the input type is supported by the converter.
`output_supported`(output_type)	Checks if the output type is supported by the converter.

Attributes

`SUPPORTED_INPUT_TYPES`	Tuple of input modalities supported by this converter.
`SUPPORTED_OUTPUT_TYPES`	Tuple of output modalities supported by this converter.
`supported_input_types`	Returns a list of supported input types for the converter.
`supported_output_types`	Returns a list of supported output types for the converter.

SUPPORTED_INPUT_TYPES: tuple[Literal['text', 'image_path', 'audio_path', 'video_path', 'url', 'reasoning', 'error', 'function_call', 'tool_call', 'function_call_output'], ...] = ('image_path',)#: Tuple of input modalities supported by this converter. Subclasses must override this.

SUPPORTED_OUTPUT_TYPES: tuple[Literal['text', 'image_path', 'audio_path', 'video_path', 'url', 'reasoning', 'error', 'function_call', 'tool_call', 'function_call_output'], ...] = ('image_path',)#: Tuple of output modalities supported by this converter. Subclasses must override this.

async convert_async(*, prompt: str, input_type: Literal['text', 'image_path', 'audio_path', 'video_path', 'url', 'reasoning', 'error', 'function_call', 'tool_call', 'function_call_output'] = 'image_path') → ConverterResult[source]#

Converts the given prompt by blending an attack image (potentially harmful) with a benign image. Uses the Novel Image Blending Algorithm from: https://arxiv.org/abs/2401.15817.

Parameters:

prompt (str) – The image file path to the attack image.
input_type (PromptDataType) – The type of input data. Must be “image_path”.

Returns:

The result containing path to the manipulated image with transparency.

Return type:

ConverterResult

Raises:

ValueError – If the input type is not supported or if the prompt is invalid.

pyrit.prompt_converter.TransparencyAttackConverter

Contents

pyrit.prompt_converter.TransparencyAttackConverter#