RAI Policy Management Overview¶

Responsible AI (RAI) Policy Management is a core feature of the RAI platform, enabling customers to define, customize, and enforce policies for analyzing harmful content. These policies streamline the process of identifying harmful content and managing its impact effectively, while offering flexibility to meet diverse customer requirements.

Core Concepts¶

1. Task¶

In the RAI platform, a task represents a specific type of harmful content annotation (e.g., identifying sexual content) applied to a defined content scope. Tasks can optionally include a blocking threshold to determine whether the content should be blocked based on its severity. More details

2. Policy¶

A policy is an array of tasks defining the actions RAI services should perform. Each task within the policy specifies:

Harm Category: The type of harmful content (e.g., HATE, SELF_HARM).
Scope: Where and how the task applies (e.g., to all user prompts or assistant responses).
Thresholds and Blocking Criteria: Rules for triggering content blocking based on severity or detection criteria.

Policy Example¶

{
  "name": "MyPolicy10",
  "taskSettings": [
    {
      "settingId": "task_setting_01",
      "settingEnabled": true,
      "appliedFor": [
        {
          "role": "user",
          "source": "prompt"
        },
        {
          "role": "system",
          "source": "completion"
        }
      ],
      "kind": "harmCategory",
      "harmCategoryTaskSetting": {
        "harmCategory": "hate"
      },
      "blockingCriteria": {
        "enabled": true,
        "kind": "severity",
        "allowedSeverity": 0
      }
    }
  ]
}

Manage Policy with API

Types of Policies¶

The RAI platform supports four distinct types of policies:

1. Standard Policy¶

Customers can pre-define policies using the Policy Management API. These are reusable and suitable for long-term deployment.

2. Inline Policy¶

Policies can be defined inline within each analysis request, allowing for quick iterations during development or testing.

3. Legacy RAI Policy¶

For compatibility, legacy numeric policy IDs (e.g., 189) are supported, ensuring seamless integration with existing systems.

4. Custom Policy Name¶

Policies can also be defined with custom names in AOAI, offering additional flexibility for identification and organization.

Policy Structure¶

A policy contains multiple tasks, and each task is defined by:

raiPolicyName: A unique identifier for the policy. This name is used to reference the policy in API requests.
taskSettings: An array of settings that define the tasks within the policy. Each task specifies the harm categories to monitor, the roles and sources it applies to, and the criteria for blocking content. The key elements of taskSettings include:
settingId: A unique identifier for the task setting. This helps distinguish between multiple tasks within the same policy.
settingEnabled: A boolean value indicating whether the task is active. If set to false, the task will not be applied during content analysis.
appliedFor: Specifies the roles (e.g., User, Assistant, System, Tool, Function) and source types (e.g., Prompt, Completion, All) to which the task applies. This ensures that the task is only enforced in relevant contexts.
kind: The type of task, such as HarmCategory or other supported task types. This determines the nature of the analysis performed.
harmCategoryTaskSetting: Defines the harm category to monitor (e.g., Hate, SelfHarm, Violence). This specifies the type of harmful content the task is designed to detect.
blockingCriteria: Specifies the thresholds for blocking content. This includes:
- enabled: Whether blocking is enabled for this task.
- kind: The type of blocking criterion will be one of the values [Severity, RiskLevel, IsDetected]. The purpose of this setting is to allow more flexibility in blocking thresholds.
- (optional) allowedSeverity: An integer, the maximum severity level allowed before content is blocked. For example, a value of 0 indicates that no harmful content is allowed.
- (optional) allowedRiskLevel: RiskLevel Type, one of these values: [Safe, Low, Medium, High].
- (optional) isDetected: A boolean indicating whether the content has been flagged as detected.

Examples¶

Policy Example¶

{
  "name": "MyPolicy10",
  "taskSettings": [
    {
      "settingId": "task_setting_01",
      "settingEnabled": true,
      "appliedFor": [
        {
          "role": "User",
          "source": "Prompt"
        },
        {
          "role": "System",
          "source": "Completion"
        }
      ],
      "kind": "HarmCategory",
      "harmCategoryTaskSetting": {
        "harmCategory": "Hate"
      },
      "blockingCriteria": {
        "enabled": true,
        "kind": "Severity",
        "allowedSeverity": 0
      }
    }
  ]
}

Policy in an HTTP Request¶

{
  "messages": [
    {
      "source": "Prompt",
      "role": "User",
      "contents": [
        { 
          "kind": "Text", 
          "text": "This is a sample text message for analysis." 
        },
        { 
          "kind": "Image", 
          "imageBase64": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mP8/wcAAwAB/epH1FUAAAAASUVORK5CYII=" 
        }
      ]
    }
  ],
  "raiPolicyName": "myPolicy"
}

Policy in a gRPC Request¶

{
  "raiPolicyName": "name"
}

For gRPC, the policy must be sent as the first message in the sequence.

Policy Management APIs¶

The platform provides a comprehensive CRUD API for creating, reading, updating, and deleting policies. Once a policy is created, it can be seamlessly integrated into analysis requests via HTTP or gRPC. Policy API reference

Conclusion¶

RAI Policy Management offers customers the flexibility to define tailored policies, ensuring precise and effective harmful content detection. With support for standard, inline, legacy, and custom policies, the platform enables efficient, adaptable, and scalable management of Responsible AI tasks.