pyrit.orchestrator.SkeletonKeyOrchestrator

pyrit.orchestrator.SkeletonKeyOrchestrator#

class SkeletonKeyOrchestrator(*, skeleton_key_prompt: str | None = None, prompt_target: PromptTarget, prompt_converters: list[PromptConverter] | None = None, batch_size: int = 10, verbose: bool = False)[source]#

Bases: Orchestrator

Creates an orchestrator that executes a skeleton key jailbreak.

The orchestrator sends an inital skeleton key prompt to the target, and then follows up with a separate attack prompt. If successful, the first prompt makes the target comply even with malicious follow-up prompts. In our experiments, using two separate prompts was more effective than using a single combined prompt.

Learn more about attack at the link below: https://www.microsoft.com/en-us/security/blog/2024/06/26/mitigating-skeleton-key-a-new-type-of-generative-ai-jailbreak-technique/

__init__(*, skeleton_key_prompt: str | None = None, prompt_target: PromptTarget, prompt_converters: list[PromptConverter] | None = None, batch_size: int = 10, verbose: bool = False) → None[source]#

Parameters:

skeleton_key_prompt (str, Optional) – The skeleton key sent to the target, Default: skeleton_key.prompt
prompt_target (PromptTarget) – The target for sending prompts.
prompt_converters (list[PromptConverter], Optional) – List of prompt converters. These are stacked in the order they are provided. E.g. the output of converter1 is the input of converter2.
batch_size (int, Optional) – The (max) batch size for sending prompts. Defaults to 10. Note: If providing max requests per minute on the prompt_target, this should be set to 1 to ensure proper rate limit management.
verbose (bool, Optional) – If set to True, verbose output will be enabled. Defaults to False.

Methods

`__init__`(*[, skeleton_key_prompt, ...])
`dispose_db_engine`()	Dispose database engine to release database connections and resources.
`get_identifier`()
`get_memory`()	Retrieves the memory associated with this orchestrator.
`get_score_memory`()	Retrieves the scores of the PromptRequestPieces associated with this orchestrator.
`print_conversation`()	Prints all the conversations that have occured with the prompt target.
`send_skeleton_key_with_prompt_async`(*, prompt)	Sends a skeleton key, followed by the attack prompt to the target.
`send_skeleton_key_with_prompts_async`(*, ...)	Sends a skeleton key and prompt to the target for each prompt in a list of prompts.

print_conversation() → None[source]#: Prints all the conversations that have occured with the prompt target.

async send_skeleton_key_with_prompt_async(*, prompt: str) → PromptRequestResponse[source]#

Sends a skeleton key, followed by the attack prompt to the target.

Args

prompt (str): The prompt to be sent. prompt_type (PromptDataType, Optional): The type of the prompt (e.g., “text”). Defaults to “text”.

Returns:: The response from the prompt target.
Return type:: PromptRequestResponse

async send_skeleton_key_with_prompts_async(*, prompt_list: list[str]) → list[PromptRequestResponse][source]#

Sends a skeleton key and prompt to the target for each prompt in a list of prompts.

Parameters:

prompt_list (list[str]) – The list of prompts to be sent.
prompt_type (PromptDataType, Optional) – The type of the prompts (e.g., “text”). Defaults to “text”.

Returns:

The responses from the prompt target.

Return type:

list[PromptRequestResponse]

pyrit.orchestrator.SkeletonKeyOrchestrator

Contents

pyrit.orchestrator.SkeletonKeyOrchestrator#