GPT-RAG
The Retrieval-Augmented Generation (RAG) pattern is an industry-standard approach to building applications that use large language models to reason over specific or proprietary data that is not already known to the large language model.
This page provides the alert settings for AI RAG pattern setup. It contains relevant metrics and threshold recommendations for key services involved in a RAG pattern architecture.For a reference architecture design of RAG, see GPT-RAG.
Below is a basic architecture of RAG implementation
We may update these settings as we continue to work with a breadth of customers.
Alert Name | Component | Metric | Aggregation | Operator | Threshold | Window | Frequency | Severity | Scope | Support for Multiple Resources | Verified | References |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Microsoft.CognitiveServices/accounts | TotalCalls | Total | GreaterThan | 5000 | PT5M | PT1M | 3 | No | N | |||
Microsoft.CognitiveServices/accounts | TotalErrors | Total | GreaterThan | 5 | PT5M | PT1M | 2 | No | N | |||
Microsoft.CognitiveServices/accounts | Latency | Average | GreaterThan | 90 | PT5M | PT1M | 2 | No | N | |||
Microsoft.CognitiveServices/accounts | SuccessRate | Average | LessThan | 99.9 | PT5M | PT1M | 3 | No | N | |||
Microsoft.CognitiveServices/accounts | ServerErrors | Total | GreaterThan | 0 | PT5M | PT1M | 3 | No | N | |||
Microsoft.CognitiveServices/accounts | TokenTransaction | Total | GreaterThan | 180000 | PT5M | PT1M | 2 | No | N | |||
Microsoft.CognitiveServices/accounts | BlockedCalls | Total | GreaterThan | 0 | PT5M | PT1M | 2 | No | N | |||
Microsoft.CognitiveServices/accounts | ClientErrors | Total | GreaterThan | 0 | PT5M | PT1M | 1 | No | N | |||
Microsoft.CognitiveServices/accounts | AzureOpenAIContextTokensCacheMatchRate | Total | GreaterThan | 75 | PT5M | PT1M | 2 | No | Y | |||
Microsoft.CognitiveServices/accounts | AzureOpenAIProvisionedManagedUtilizationV2 | Total | GreaterThan | 80 | PT5M | PT1M | 2 | No | Y | |||
Microsoft.CognitiveServices/accounts | AzureOpenAITimeToResponse | Total | GreaterThan | 200 | PT5M | PT1M | 2 | No | Y | |||
Microsoft.DocumentDB/databaseAccounts | TotalRequests | Count | GreaterThan | 5 | PT5M | PT1M | 3 | No | N | Monitor Azure Cosmos DB Create alerts for Azure Cosmos DB using Azure Monitor Monitoring Azure Cosmos DB data reference Explore Azure Monitor Azure Cosmos DB insights | ||
Microsoft.DocumentDB/databaseAccounts | NormalizedRUConsumption | Average | GreaterThan | 70 | PT5M | PT1M | 3 | No | N | How to monitor normalized RU/s for an Azure Cosmos DB container or an account Monitor Azure Cosmos DB Create alerts for Azure Cosmos DB using Azure Monitor | ||
Microsoft.DocumentDB/databaseAccounts | ServiceAvailability | Average | LessThan | 99.9 | PT1H | PT5M | 1 | No | N | Monitor Azure Cosmos DB Create alerts for Azure Cosmos DB using Azure Monitor Monitoring Azure Cosmos DB data reference | ||
Microsoft.DocumentDB/databaseAccounts | TotalRequestUnits | Total | GreaterThan | 100 | PT5M | PT1M | 2 | No | N | Monitor Azure Cosmos DB Create alerts for Azure Cosmos DB using Azure Monitor Monitoring Azure Cosmos DB data reference How to monitor throughput or request unit usage of an operation in Azure Cosmos DB | ||
Microsoft.DocumentDB/databaseAccounts | ServerSideLatency | Average | GreaterThan | 100 | PT5M | PT1M | 3 | No | N | Monitor Azure Cosmos DB Create alerts for Azure Cosmos DB using Azure Monitor Monitoring Azure Cosmos DB data reference | ||
Microsoft.DocumentDB/databaseAccounts | ProvisionedThroughput | Maximum | GreaterThan | 3000 | PT1H | PT1M | 3 | No | N | Monitor Azure Cosmos DB Create alerts for Azure Cosmos DB using Azure Monitor Monitoring Azure Cosmos DB data reference | ||
Microsoft.DocumentDB/databaseAccounts | RegionFailover | Count | GreaterThan | 0 | PT5M | PT1M | 3 | No | N | Monitor Azure Cosmos DB Create alerts for Azure Cosmos DB using Azure Monitor | ||
Microsoft.DocumentDB/databaseAccounts | UpdateAccountKeys | Count | GreaterThanOrEqual | 1 | PT5M | PT5M | 2 | No | N | Monitor Azure Cosmos DB Create alerts for Azure Cosmos DB using Azure Monitor Monitor your Azure Cosmos DB account for key updates and key regeneration | ||
Microsoft.DocumentDB/databaseAccounts | DataUsage | Total | GreaterThan | 2.147483648e+09 | PT5M | PT1M | 3 | No | N | |||
Microsoft.DocumentDB/databaseAccounts | MongoRequests | Count | GreaterThan | 9 | PT5M | PT1M | 3 | No | N | Monitor Azure Cosmos DB Create alerts for Azure Cosmos DB using Azure Monitor Monitoring Azure Cosmos DB data reference | ||
Microsoft.DocumentDB/databaseAccounts | RemoveRegion | Count | GreaterThanOrEqual | 0 | PT15M | PT5M | 3 | No | N | Monitor Azure Cosmos DB Create alerts for Azure Cosmos DB using Azure Monitor | ||
Microsoft.DocumentDB/databaseAccounts | ReplicationLatency | Average | GreaterThan | 5000 | PT15M | PT5M | 3 | No | N | |||
Microsoft.DocumentDB/databaseAccounts | SqlContainerDelete | Count | GreaterThanOrEqual | 0 | PT15M | PT5M | 2 | No | N | Monitor Azure Cosmos DB Create alerts for Azure Cosmos DB using Azure Monitor | ||
Microsoft.DocumentDB/databaseAccounts | OfflineRegion | Count | GreaterThan | 0 | PT5M | PT1M | 3 | No | N | |||
Microsoft.DocumentDB/databaseAccounts | SqlDatabaseDelete | Count | GreaterThanOrEqual | 0 | PT15M | PT5M | 2 | No | N | Monitor Azure Cosmos DB Create alerts for Azure Cosmos DB using Azure Monitor | ||
Microsoft.KeyVault/vaults | Availability | Average | LessThan | 90 | PT5M | PT1M | 1 | No | Y | Monitoring KeyVault Reference Monitoring Microsoft.KeyVault/vaults KeyVault Insights Overview | ||
Microsoft.KeyVault/vaults | SaturationShoebox | Average | GreaterThan | 75 | PT5M | PT1M | 1 | No | Y | Monitoring KeyVault Reference Monitoring Microsoft.KeyVault/vaults KeyVault Insights Overview | ||
Microsoft.KeyVault/vaults | ServiceApiLatency | Average | GreaterThan | 1000 | PT5M | PT5M | 3 | No | Y | Monitoring KeyVault Reference Monitoring Microsoft.KeyVault/vaults KeyVault Insights Overview | ||
Microsoft.KeyVault/vaults | ServiceApiResult | Average | GreaterThan | dynamic | PT5M | PT5M | 2 | No | Y | Monitoring KeyVault Reference Monitoring Microsoft.KeyVault/vaults KeyVault Insights Overview | ||
Microsoft.KeyVault/vaults | ServiceApiHit | Average | GreaterThanOrEqual | 80 | PT5M | PT5M | 3 | No | N | |||
Microsoft.Search/searchServices | SearchLatency | Average | GreaterThan | 5 | PT5M | PT1M | 3 | No | N | |||
Microsoft.Search/searchServices | ThrottledSearchQueriesPercentage | Average | GreaterThan | 10 | PT5M | PT1M | 3 | No | N | |||
Microsoft.Storage/storageAccounts | Availability | Average | LessThan | 100 | PT5M | PT5M | 1 | No | Y | Monitoring Availability Supported metrics for Microsoft.Storage/storageAccounts | ||
Microsoft.Storage/storageAccounts/fileServices | Transactions | Total | GreaterThanOrEqual | 1 | PT15M | PT5M | 2 | No | N | High latency, low throughput, or low IOPS | ||
Microsoft.Storage/storageAccounts | UsedCapacity | Average | GreaterThan | 2.2518e+15 | PT1H | PT1H | 3 | No | N | Account Level Metrics Azure Storage Metric - Used Capacity | ||
Microsoft.Storage/storageAccounts | Egress | Total | GreaterThan | 6e+07 | PT5M | PT5M | 2 | No | N | Transaction Metrics Storage Account Metric Dimensions (all storage) | ||
Microsoft.Storage/storageAccounts | Ingress | Total | GreaterThan | 1.073741824e+09 | PT5M | PT5M | 3 | No | N | Transaction Metrics Storage Account Metric Dimensions (all storage) | ||
Microsoft.Storage/storageAccounts/blobServices | SuccessE2ELatency | Average | GreaterThan | 1000 | PT5M | PT1M | 3 | No | N | Verify throughput and latency metrics for a storage account Troubleshoot performance in Azure storage accounts | ||
Microsoft.Storage/storageAccounts/blobServices | SuccessServerLatency | Average | GreaterThan | 1000 | PT5M | PT1M | 2 | No | N | Trouble shoot performance in Azure storage accounts Verify throughput and latency metrics for a storage account Storage Transaction Metrics | ||
Microsoft.Storage/storageAccounts/fileServices | Transactions | Total | GreaterThan | 10 | PT5M | PT1M | 3 | No | N | Identify storage accounts with no or low use Monitor the use of a container Storage Transaction Metrics | ||
Microsoft.Web/sites | AverageResponseTime | Average | GreaterThan | 60 | PT5M | PT5M | 3 | No | N | |||
Microsoft.Web/sites | CpuTime | Total | GreaterThan | 120 | PT5M | PT1M | 3 | No | N | Understand App Service Metrics Supported Metrics Monitor your app CPU time vs CPU percentage Alerts and Autoscale in Azure App Service | ||
Microsoft.Web/sites | AppConnections | Maximum | GreaterThan | 6000 | PT15M | PT5M | 3 | No | N | Understand App Service Metrics Supported Metrics Manage Connections in Azure Functions Configure Monitoring for Azure Functions | ||
Microsoft.Web/sites | RequestsInApplicationQueue | Maximum | GreaterThan | 10 | PT15M | PT5M | 3 | No | N | Understand App Service Metrics Supported Metrics | ||
Microsoft.Web/sites | PrivateBytes | Average | GreaterThan | 1.2e+09 | PT5M | PT1M | 3 | No | N | Understand App Service Metrics Supported Metrics | ||
Microsoft.Web/sites | FileSystemUsage | Average | GreaterThan | 4e+08 | PT6H | PT1H | 1 | No | N | Understand App Service Metrics Supported Metrics Quota Enforcement | ||
Microsoft.Web/sites | MemoryWorkingSet | Average | GreaterThan | 1.5e+09 | PT5M | PT1M | 3 | No | N | Understand App Service Metrics Supported Metrics Monitor your app | ||
Microsoft.Web/sites | Threads | Average | GreaterThan | 200 | PT15M | PT5M | 4 | No | N | Understand App Service Metrics Supported Metrics | ||
Microsoft.Web/sites | Http401 | Total | GreaterThan | 20 | PT5M | PT5M | 2 | No | N | Understand App Service Metrics Supported Metrics Client-side JavaScript SDK Exception Reporting | ||
Microsoft.Web/sites | Requests | Total | GreaterThan | 1000 | PT5M | PT1M | 3 | No | N | |||
Microsoft.Web/sites | FunctionExecutionCount | Total | LessThanOrEqual | 0 | PT5M | PT5M | 1 | No | N | Function Execution Count Monitor Azure Functions Supported Metrics | ||
Microsoft.Web/sites | BytesSent | Average | GreaterOrLessThan | dynamic | PT5M | PT1M | 3 | No | N | Understand App Service Metrics Supported Metrics | ||
Microsoft.Web/sites | Http406 | Total | GreaterThan | 1 | PT15M | PT15M | 1 | No | N | Understand App Service Metrics Supported Metrics | ||
Microsoft.Web/sites | Http3xx | Total | GreaterThan | 15 | PT5M | PT5M | 3 | No | N | Understand App Service Metrics Supported Metrics Enable diagnostic logging for Apps in Azure App Service HTTP Status Classes | ||
Microsoft.Web/sites | WorkflowRunsFailureRate | Total | GreaterThan | 0 | PT5M | PT5M | 1 | No | N | |||
Microsoft.Web/sites | BytesReceived | Total | GreaterThan | 2.048e+09 | PT5M | PT1M | 3 | No | N | Understand App Service Metrics Supported Metrics | ||
Microsoft.Web/sites | Handles | Average | GreaterOrLessThan | dynamic | PT5M | PT1M | 2 | No | N | Understand App Service Metrics Supported Metrics | ||
Microsoft.Web/sites | FunctionExecutionUnits | Total | GreaterThan | 1.3e+10 | PT5M | PT1M | 3 | No | N | Function Execution Units | ||
Microsoft.Web/sites | WorkflowTriggersFailureRate | Total | GreaterThan | 50 | PT5M | PT5M | 1 | No | N | |||
Microsoft.Web/sites | Http2xx | Total | GreaterThan | 15 | PT5M | PT5M | 3 | No | N | |||
Microsoft.Web/sites | CurrentAssemblies | Average | GreaterThan | 0 | PT1M | PT1M | 0 | No | N | |||
Microsoft.Web/sites/slots | Http5xx | Total | GreaterThan | 10 | PT15M | PT5M | 1 | No | N | Understand App Service Metrics Supported Metrics Diagnose Web Apps' Performance with Application Insights Troubleshoot HTTP 502/503 Errors | ||
Microsoft.Web/sites/slots | HttpResponseTime | Average | GreaterThan | 5 | PT30M | PT15M | 1 | No | N | Understand App Service Metrics Supported Metrics Troubleshoot Slow App Performance | ||
Microsoft.Web/sites/slots | Http4xx | Average | GreaterThan | 5 | PT30M | PT15M | 1 | No | N | Understand App Service Metrics Supported Metrics | ||
Microsoft.Web/sites/slots | AverageMemoryWorkingSet | Average | GreaterThan | 8e+08 | PT5M | PT5M | 3 | No | N | |||
Microsoft.Web/sites/slots | HealthCheckStatus | Average | LessThan | 100 | PT5M | PT1M | 3 | No | N | Understand App Service Metrics Supported Metrics Monitor App Service Instances using Health check | ||
Microsoft.Web/sites/slots | Http403 | Total | GreaterThan | 5 | PT30M | PT15M | 0 | No | N | |||
Microsoft.Web/sites/slots | Http404 | Average | GreaterThan | 30 | PT15M | PT5M | 2 | No | N | Understand App Service Metrics Supported Metrics |