Skip to content

Harm Types Supported by RAI

The Responsible AI (RAI) platform supports various harm types, categorized by their detection and evaluation methods. These harm types help ensure comprehensive analysis and protection in AI systems.

Harm Types with Severity-Based Definition

The following harm types are defined and managed using severity levels:

Violence

  • Description: Identifies content that promotes or describes physical harm or threats of violence.
  • Severity Levels: Indicates the intensity of the violent content.
  • 0,1: Low (Risk Level: Low)
  • 2,3,4: Medium (Risk Level: Medium)
  • 5,6,7: High (Risk Level: High)

Self-Harm

  • Description: Detects content related to self-injury or suicidal behavior.
  • Severity Levels: Reflects the level of risk associated with the content.
  • 0,1: Low (Risk Level: Low)
  • 2,3,4: Medium (Risk Level: Medium)
  • 5,6,7: High (Risk Level: High)

Hate

  • Description: Flags content that incites hatred or discrimination against individuals or groups.
  • Severity Levels: Represents the degree of hateful language or behavior.
  • 0,1: Low (Risk Level: Low)
  • 2,3,4: Medium (Risk Level: Medium)
  • 5,6,7: High (Risk Level: High)

Sexual

  • Description: Detects sexually explicit or inappropriate content.
  • Severity Levels: Captures the extent and explicitness of the content.
  • 0/1: Low (Risk Level: Low)
  • ⅔/4: Medium (Risk Level: Medium)
  • ⅚/7: High (Risk Level: High)

Harm Types with Boolean Detection

The following harm types are defined and managed using a true/false detection model:

Code Vulnerability

  • Description: Detects vulnerabilities or risky behaviors in code provided within the AI system.
  • Detection: isDetected: true/false

Jailbreak

  • Description: Identifies attempts to bypass AI system restrictions or ethical safeguards.
  • Detection: isDetected: true/false

ProtectedMaterialCode

  • Description: Identifies code snippets that contain or reference protected or proprietary material.
  • Detection: isDetected: true/false

ProtectedMaterialText

  • Description: Flags text content that includes protected or proprietary information.
  • Detection: isDetected: true/false

Xpia

  • Description: Flags content or behavior related to exploitation or abuse of AI systems.
  • Detection: isDetected: true/false

Summary

The RAI platform ensures robust detection and management of harmful content through a combination of severity-based and boolean detection mechanisms. This enables the platform to adapt to varying levels of content risk while maintaining a comprehensive approach to responsible AI practices.