Ensure AOAI models are deployed using Global deployment
Impact:HighCategory:High Availability
APRL GUID:081fc8a4-b2d9-405b-b351-334e621016f5
Description:
Global deployments leverage Azure's global infrastructure to route customer traffic to the best available data center for the customer’s inference requests. This ensures highest initial throughput limits and best model availability while still providing our uptime SLA and low latency.
Potential Benefits:
Low latency, best model availability, business continuity
Click the Azure Resource Graph tab to view the query
//cannot-be-validated-with-arg
Deploy a PAYG instance of the model with provisioned throughput to manage overflow effectively
Impact:HighCategory:High Availability
APRL GUID:0c193899-da60-4a52-b4a0-77d75ac8c5c5
Description:
Provisioned Throughput offers pre-allocated capacity for consistent workloads, while Pay-as-You-Go charges for actual usage, ideal for variable workloads. During overflow, the Pay-as-You-Go instance manages excess load, ensuring service efficiency
Potential Benefits:
PAYG model balances cost and performance and helps scale
Click the Azure Resource Graph tab to view the query
//cannot-be-validated-with-arg
Ensure that models are deployed using Global batch for large scale processing
Impact:HighCategory:Scalability
APRL GUID:8aa9744b-f302-4b05-9776-51d6dd3d0c3a
Description:
Global batch efficiently handles large-scale tasks within 24 hours. Submit requests in a single file, with a separate quota to protect online workloads. Key uses: data processing, content generation, document review, customer support automation, data extraction, NLP tasks, and marketing
Potential Benefits:
Cost effective faster turnaround for large-scale processing.
Click the Azure Resource Graph tab to view the query
//cannot-be-validated-with-arg
Ensure AOAI models are deployed using Data Zone Standard for data residency requirements
Impact:HighCategory:Governance
APRL GUID:ac3add17-013e-41a5-af91-9fefce794a00
Description:
Data zone deployments route customer traffic to the highest availability data center within the defined data zone, ensuring data at rest remains within the Azure OpenAI resource geography. This approach offers increased quota limits and ensures data processing occurs within the specified data zone
Click the Azure Resource Graph tab to view the query
//cannot-be-validated-with-arg
Deploy AOAI Service in multiple regions using Standard and/or Provisioned deployments
Impact:HighCategory:High Availability
APRL GUID:61187af4-7d36-4b48-b16e-de78bef143a0
Description:
If your service needs to always be available, design AOAI Service to either failover into another region or split the workload between two or more regions. Applications requiring high degrees of resiliency should consider this to strengthen their model infrastructure.
Potential Benefits:
Ensures business continuity during regional outages.