Leverage Global provisioned deployment to ensure high and predictable throughput
Impact:HighCategory:High Availability
APRL GUID:042c034e-2b85-4c1d-bf9a-65c75a6b43e9
Description:
Global provisioned deployments provide reserved model processing capacity for high and predictable throughput using Azure global infrastructure. Suitable for applications requiring lower latency variance at large workload usage. Provides cost savings.
Potential Benefits:
Low latency variance, high throughputs, business continuity
Click the Azure Resource Graph tab to view the query
//cannot-be-validated-with-arg
Ensure PAYG AOAI models leverage Global Standard deployment
Impact:HighCategory:High Availability
APRL GUID:081fc8a4-b2d9-405b-b351-334e621016f5
Description:
Global Standard leverages Azure's global infrastructure to route traffic to the best available data center for customer's real-time inference requests. It provides the highest default quota and eliminates the need to load balance across multiple resources. Optimized for low to medium volume usage.
Click the Azure Resource Graph tab to view the query
//cannot-be-validated-with-arg
Deploy a PAYG instance of the model with provisioned throughput to manage overflow effectively
Impact:HighCategory:High Availability
APRL GUID:0c193899-da60-4a52-b4a0-77d75ac8c5c5
Description:
Provisioned Throughput offers pre-allocated capacity for consistent workloads, while Pay-as-You-Go charges for actual usage, ideal for variable workloads. During overflow, the Pay-as-You-Go instance manages excess load, ensuring service efficiency.
Potential Benefits:
PAYG model balances cost and performance and helps scale
Click the Azure Resource Graph tab to view the query
//cannot-be-validated-with-arg
Ensure that models are deployed using Global batch for large scale processing
Impact:HighCategory:Scalability
APRL GUID:8aa9744b-f302-4b05-9776-51d6dd3d0c3a
Description:
Global batch efficiently handles large-scale tasks within 24 hours. Submit requests in a single file, with a separate quota to protect online workloads. Key uses: data processing, content generation, document review, customer support automation, data extraction, NLP tasks, and marketing.
Potential Benefits:
Cost effective faster turnaround for large-scale processing.
Click the Azure Resource Graph tab to view the query
//cannot-be-validated-with-arg
Ensure AOAI models are deployed using Data Zone Standard for data residency requirements
Impact:HighCategory:Governance
APRL GUID:ac3add17-013e-41a5-af91-9fefce794a00
Description:
Data zone deployments route customer traffic to the highest availability data center within the defined data zone, ensuring data at rest remains within the Azure OpenAI resource geography. This approach offers increased quota limits and ensures data processing occurs within the specified data zone.
Click the Azure Resource Graph tab to view the query
//cannot-be-validated-with-arg
Use comprehensive monitoring solution for AOAI service
Impact:MediumCategory:Monitoring and Alerting
APRL GUID:72b1b4ad-a14b-4430-9799-91bda782973d
Description:
Implementing a comprehensive monitoring solution for AOAI involves using Azure Monitor to track API usage, performance metrics, and security events. This setup helps optimize performance, manage costs, and ensure compliance by providing detailed insights into model usage and potential issues.
Potential Benefits:
Optimize performance and compliance with detailed insights
Click the Azure Resource Graph tab to view the query
//cannot-be-validated-with-arg
Deploy AOAI Service in multiple regions using Standard and/or Provisioned deployments
Impact:HighCategory:High Availability
APRL GUID:61187af4-7d36-4b48-b16e-de78bef143a0
Description:
If your service needs to always be available, design AOAI Service to either failover into another region or split the workload between two or more regions. Applications requiring high degrees of resiliency should consider this to strengthen their model infrastructure.
Potential Benefits:
Ensures business continuity during regional outages.