Attacks#
The Attack is a top-level component that red team operators will interact with the most. It is responsible for telling PyRIT which endpoints to connect to and how to send prompts. It can be thought of as the component that executes an attack technique.
An Attack is made up of four components, which create a consistent flow of control:
flowchart LR A(["Attack Strategy <br> << abstract base class >>"]) A --consumes--> B(["Attack Context <br> << abstract base class >>"]) A --takes in as parameters within __init__--> D(["Attack Configurations"]) A --produces--> C(["Attack Result <br> << abstract base class >>"])
To execute an Attack, one generally follows this pattern:
Create an attack context containing state information (i.e. attack objective, memory labels, prepended conversations, seed prompts)
Initialize an attack strategy (with optional attack configurations for converters, scorers, and adversarial chat targets)
Execute the attack strategy with the created context
Recieve and process the attack result
Each attack implements a lifecycle with distinct phases (all abstract methods), and the base AttackStrategy
class provides a non-abstract execute_async()
method that enforces this lifecycle:
_validate_context
: Validates the context of the attack_setup_async
: Prepare the attack (initialize state)_perform_attack_async
: Execute the core attack logic_teardown_async
: Clean up resources and finalize the attack
This implementation enforces a consistent execution flow across all strategies:
It guarantees that setup is always performed before the attack begins
It ensures the attack logic is only executed if setup succeeds
It guarantees teardown is always executed, even if errors occur, through the use of a finally block
It provides centralized error handling and logging
The following documentation will illustrate the different kinds of attacks within PyRIT. Some simply send prompts and run them through converters. Others instantiate more complicated attack techniques, like PAIR, TAP, and Crescendo.
See the below diagrams for more details of the components:
flowchart LR subgraph AttackStrategy["AttackStrategy"] S_s["Other Single Turn Attacks (e.g. RolePlayAttack, SkeletonKeyAttack)"] --inherit from--> S_psa["PromptSendingAttack"] S_c["CrescendoAttack"] S_r["RedTeamingAttack"] s_t["TreeOfAttacksWithPruningAttack"] end
flowchart LR subgraph AttackContext["AttackContext"] C_s["SingleTurnAttackContext"] a["conversation_id"] b["seed_prompt_group"] c["..."] C_m["MultiTurnAttackContext"] A["custom_prompt"] B["..."] C_o["objective"] C_mem["memory_labels"] C_rel["related_conversations"] end C_s-->C_o C_s-->C_mem C_s-->C_rel C_m-->B C_m-->A C_s-->c C_s-->a C_s-->b C_m-->C_o C_m-->C_mem C_m-->C_rel
flowchart LR subgraph AttackConfig["Attack Configurations"] Adv["AttackAdversarialConfig"] Adv_target["target"] Adv_sys["system_prompt_path"] Adv_seed["seed_prompt"] Scoring["AttackScoringConfig"] Scoring_obj["objective_scorer"] Scoring_ref["refusal_scorer"] Scoring_aux["auxiliary_scorers"] Scoring_misc["..."] Convert["AttackConverterConfig"] Convert_req["request_converters"] Convert_resp["response_converters"] end Adv-->Adv_target Adv-->Adv_sys Adv-->Adv_seed Convert-->Convert_req Convert-->Convert_resp Scoring-->Scoring_obj Scoring-->Scoring_ref Scoring-->Scoring_aux Scoring-->Scoring_misc
flowchart LR subgraph AttackResult["AttackResult"] a["conversation_id"] b["objective"] c["attack_identifier"] d["last_response"] e["last_score"] f["executed_turns"] g["execution_time_ms"] h["outcome"] i["outcome_reason"] j["related_conversations"] k["metadata"] end