Design Checklist

ID	Design Area	Design Consideration	Design Recommendation	References
C-R1	Compute	Determine the compute options of your models, orchestrators, self hosted agents and application (frontend, backend & Ingestion) in an AI Landing Zone.	Recommended to standardize the compute options across all components to ensure consistency and ease of management. Leverage PaaS compute options such as Azure Container Apps, Azure Apps Service, Azure Kubernetes Service to optimize resource utilization and simplify management.	Compute for Models Compute for orchestrators, self hosted agents and application
CO-R1	Cost	Familirize yourself with pricing & billing model of Azure AI Foundry and its services	Estimate costs before using Azure AI Foundry and its services for an AI application	Manage Cost
CO-R2	Cost	Consider having PTU & PAYGO Endpoints endpoints. If you have predictable workloads, use AI commitment tiers in Azure AI services. For Azure OpenAI models, use provisioned throughput units (PTUs), which can be less expensive than pay-as-you-go (consumption-based) pricing.	It's common to combine PTU endpoints and consumption-based endpoints for cost optimization. Use PTUs on the AI model primary endpoint and a secondary, consumption-based AI endpoint for spillover.	Introduce a gateway for multiple Azure OpenAI instances
CO-R3	Cost	Consider the various Azure OpenAI deployment types in particular the global deployment type.	Azure OpenAI models allow you to use different deployment types. Global deployment offers lower cost-per-token pricing on certain OpenAI models.	Deployment types
CO-R4	Cost	Cosider implementing auto shutdown policy for compute resources in non-prod environment.	Define and enforce a policy stating that AI resources must use the automatic shutdown feature on virtual machines and compute instances in Azure AI Foundry and Azure Machine Learning. Automatic shutdown is applicable to nonproduction environments and production workloads that you can take offline for certain periods of time.
D-R1	Data	Determine how stateful entities such as threads, messages, and runs created during usage along with files uploaded during Foundry Agent Service setup or as part of a message are managed	Recommended to use standard setup of Agent service and store data in your own Azure resources, giving you full ownership and control.	Storing customer data
D-R2	Data	Determine whether thread storage (conversation history, agent definitions), file storage (uploaded documents), vector search (embeddings and retrieval) will be shared by all project or separated by projects	Considering a project represents a distinct application or use case, it's recommended to separate these storage components by project to ensure data isolation and better manageability.	Project-based resource management
D-R3	Data	In case you have Microsoft Fabric, determine how you would surface data from it into AI Foundry	Recommended to leverage the Microsoft Fabric data agent for this purpose	Use the Microsoft Fabric data agent
G-R1	Governance	Consider using built-in AI related policies for governance of AI resources and AI apps	Automate policy enforcement with Azure Policy to enforce policies automatically across AI deployments, reducing human error. Apply AI policies to each management group.	Azure AI Foundry Policies, Azure Machine Learning Policies, Azure AI services Policies, Azure AI Search Policies,
G-R2	Governance	Review applicable industry standards such as the NIST Artificial Intelligence Risk Management Framework (AI RMF) and NIST AI RMF Playbook and ensure alignment and compliance with them.	Apply the regulatory compliance policy initiatives	Regulatory compliance initiatives
G-R3	Governance	Consider implementing responsible AI standards	Use the Responsible AI dashboard to generate reports around model outputs.	Responsible AI dashboard
G-R4	Governance	Consider implementing Azure AI Content Safety APIs that can be called for testing to facilitate content safety testing.	Use Azure AI Content Safety to define a baseline content filter for your approved AI models. This safety system runs both the prompt and completion for your model through a group of classification models. These classification models detect and help prevent the output of harmful content across a range of categories. Content Safety provides features like prompt shields, groundedness detection, and protected material text detection. It scans images and text.	Azure AI Content Safety
G-R5	Governance	Govern model availability in your organization across applications and workload	Use Azure Policy to manage which specific models your teams are allowed to deploy from the Azure AI Foundry model catalog. You have the option to use a built-in policy or create a custom policy. Since this approach uses an allowlist, begin with an audit effect. The audit effect allows you to monitor the models your teams are using without restricting deployments. Only switch to the deny effect once you understand the AI development and experimentation needs of workload teams, so you don't hinder their progress unnecessarily. If you switch a policy to deny, it doesn't automatically remove noncompliant models that teams have already deployed. You must remediate those models manually.	Built-in policy
I-R1	Identity	Consider using managed identities with least privilege access.	Use managed identity on all supported Azure services. Grant least privilege access to application resources that need to access AI model endpoints. Secure Azure service-to-service interactions. Use managed identity to allow Azure services to authenticate to each other without managing credentials.	Managed identity
I-R2	Identity	Leverage MFA and PIM for sensitive accounts.	Enable multifactor authentication (MFA) and prefer secondary administrative accounts or just-in-time access with Privileged Identity Management (PIM) for sensitive accounts. Limit control plane access using services like Azure Bastion as secure entry points into private networks.	Multifactor authentication, Privileged Identity Management
I-R3	Identity	Use Microsoft Entra ID for authentication	Wherever possible, eliminate static API keys in favor of Microsoft Entra ID for authentication. This step enhances security through centralized identity management and reduces secret management overhead. Also limit the distribution of API keys. Instead, prefer identities in Microsoft Entra ID over API keys for authentication. Audit the list of individuals with API key access to ensure it's current.	For authentication guidance Azure AI Foundry, Azure OpenAI, Azure AI services, Azure Machine Learning.
I-R4	Identity	Use Conditional Access policies	Implement Conditional Access policies that respond to unusual sign-in activity or suspicious behavior. Use signals like user location, device state, and sign-in behavior to trigger extra verification steps. Require MFA for accessing critical AI resources to enhance security. Restrict access to AI infrastructure based on geographic locations or trusted IP ranges. Ensure that only compliant devices (those meeting security requirements) can access AI resources.	Conditional Access policies
I-R5	Identity	Configure least privilege access	Implement least privilege access by implementing role-based access control (RBAC) to provide minimal access to data and services. Assign roles to users and groups based on their responsibilities. Use Azure RBAC to fine-tune access control for specific resources such as virtual machines and storage accounts. Ensure users have only the minimum level of access necessary to perform their tasks. Regularly review and adjust permissions to prevent privilege creep.	Role-based access control for Azure AI Foundry
I-R6	Identity	Disable key based access and only have access to AI Model endpoints using Microsoft Entra ID.	Secure external access to AI model endpoints. Require clients to authenticate using Microsoft Entra ID when accessing AI model endpoints.
M-R1	Monitoring	Monitor AI models, AI resources, AI data of the workload	Implement monitoring to ensure that it remain aligned with applications and workload KPIs
M-R2	Monitoring	Review recommended alerts for AI in Azure Monitor Baseline Alerts.	Enable recommended alert rules to receive notifications of deviations that indicate a decline in workload health.	Azure Monitor Baseline Alerts
M-R3	Monitoring	Monitor the performance of a generative AI application	For generative AI workloads, use Azure AI Foundry's built-in evaluation and manual monitoring capabilities. Also monitor latency in response times or the accuracy of vector search results to enhance user experiences. In Azure AI Foundry, enable tracing to collect trace data for each request, aggregated metrics, and user feedback. Enable diagnostic logging for each Azure AI service.
M-R4	Monitoring	Have diagnostic settings configured to capture logs and metrics of all deployed resources to a log analytics workspace.	Use diagnostic settings to capture logs and metrics for all key services, such as Azure AI Foundry and Azure AI services. Specific services should capture audit logs and relevant service-specific logs.
M-R5	Monitoring	Monitor model & data drift.	Track accuracy and data drift continuously in generative and nongenerative AI to ensure that models remain relevant. Monitoring can alert you when model predictions or large language model responses deviate from expected behavior. This deviation indicates a need for retraining or adjustment. Set up custom alerts to detect performance thresholds. This approach enables early intervention when problems arise. Use evaluations in Azure AI Foundry.
M-R6	Monitoring	Leverage Azure Monitor insights and Azure network watcher for trouble shooting networking issues.	Use services such as Azure Monitor Network Insights and Azure Network Watcher to gain visibility into network performance and health.
R-R1	Reliability	Multi Region Disaster Recovery: Consider Multi Region in at least two regions to provide high availability and ensure for Disaster Recovery.	Establish a policy for business continuity and disaster recovery for your AI endpoints and AI data. Configure baseline disaster recovery for resources that host your AI model endpoints. These resources include Azure AI Foundry, Azure Machine Learning, Azure OpenAI, or Azure AI services. All Azure data stores, such as Azure Blob Storage, Azure Cosmos DB, and Azure SQL Database, provide reliability and disaster recovery guidance that you should follow. Implement multi-region deployments to ensure high availability and resiliency for both generative and nongenerative AI systems For more information, see multi-region deployment in Azure AI Foundry, Azure Machine Learning, and Azure OpenAI.
R-R1	Resource Organization	Select regions based on the combination of regional availability of Azure Services and their configuration.	Before deployment, ensure that there's availability in the region for the AI resources that you need. Certain regions might not provide specific AI services or might have limited features, which can affect the functionality of your solution. This limitation can also affect the scalability of your deployment. For example, Azure OpenAI service availability can vary based on your deployment model. These deployment models include global standard, global provisioned, regional standard, and regional provisioned. Check the AI service to confirm that you have access to the necessary resources. To capture and use completions data, you will need to make sure your resource region is included in: swedencentral, northcentralus, eastus2 Azure OpenAI Evaluation is only supported in the following regions: [eastus2, northcentralus, swedencentral, switzerlandwest, uaenorth], and your region is eastus. Change to a valid region to use Azure OpenAI Evaluation. Real-time audio is only available in the following regions: eastus2swedencentral Assistants are only available in the following regions: australiaeast, centraluseuap, eastus, eastus2, francecentral, japaneast, norwayeast, southindia, swedencentral, uksouth, westus, westus3
R-R2	Resource Organization	Review quota required to deploy the resources.	Consider the quota or subscription limits in your chosen region as your AI workloads grow. Azure services have regional subscription limits. These limits can affect large-scale AI model deployments, such as large inference workloads. To prevent disruptions, contact Azure support in advance if you foresee a need for extra capacity.
R-R3	Resource Organization	Consider Azure subscription and region quota limits	Align the resource organization with Azure’s subscription quota limitations to avoid unexpected service disruptions.
R-R4	Resource Organization	Consider scaling through multi-account and multi-project deployment.	Azure offers tools like Azure AI Foundry Resource and projects to enforce governance and security. Use an AI Foundry Resource per billing boundary to allocate costs across different teams. For more information, see Manage AI deployments. Use distinct AI Foundry resoruces to organize and manage AI artifacts like datasets, models, and experiments. AI Foundry resoruces centralize resource management and simplify access control. For example, use projects within Azure AI Foundry to manage resources and permissions efficiently, facilitating collaboration while maintaining security boundaries.
S-R1	Security	Review the Microsoft Defender for Cloud Recommendations for compliance.	MDC can help discover generative AI workloads and in predeployment generative AI artifacts. Also AI security posture management in Microsoft Defender for Cloud can be used to automate detection and remediation of generative AI risks. Defender for Cloud provides a cost-effective approach for detecting configurations in your deployed resources that aren't secure. You should also enable AI threat protection.
S-R2	Security	Comply with Microsoft Cloud Security Baseline	Leverage Azure security baselines and follow Azure Service Guides for security guidance.
S-R3	Security	Leverage Purview to secure data in an AI landing zone.	Sensitive data in AI workflows increases the risk of insider threats, data leaks and data oversharing. Use tools like Microsoft Purview Insider Risk Managementto assess enterprise-wide data risks and prioritize them based on data sensitivity.
S-R4	Security	Review Industry Security Standards for AI	Leverage MITRE ATLAS and OWASP Generative AI risk for identifying risks across the architecture.
S-R5	Security	Monitor outputs and apply prompt shielding using AI Content Safety.	Regularly inspect the data returned by AI models to detect and mitigate risks associated with malicious or unpredictable user prompts. Implement Prompt Shields to scan text for the risk of a user input attack on generative Al models.
N-R1	Networking	Leverrage DDoS in case the workload is public facing. In case of existing platform landing zone, the central DDoS service should be used instead.	Azure DDoS Protection should be enabled to safeguard AI services from potential disruptions and downtime caused by distributed denial of service attacks. Enable Azure DDoS protection at the virtual network level to defend against traffic floods targeting internet-facing applications.
N-R2	Networking	Use a jumpbox that can be accessed through bastion. In case of existing platform landing zone the central jump box and bastion service should be used instead.	AI development access should use a jumpbox within the virtual network of the workload or through a connectivity hub virtual network. Use Azure Bastion to securely connect to virtual machines interacting with AI services. Azure Bastion provides secure RDP/SSH connectivity without exposing VMs to the public internet. Enable Azure Bastion to ensure encrypted session data and protect access through TLS-based RDP/SSH connections.
N-R3	Networking	Use Private endpoint for the AI resources and infact for all PaaS services.	No PaaS services or AI model endpoints should be accessible from the public internet. Private endpoints to provide private connectivity to Azure services within a virtual network. Private endpoints provide secure, private access to PaaS portals like Azure AI Foundry. For Azure AI Foundry, Configure the managed virtual network and use private endpoints.
N-R4	Networking	Use Network Security Groups in the AI Landing Zone on all virtual networks implemented as part of the architecture.	Utilize network security groups (NSGs) to define and apply access policies that govern inbound and outbound traffic to and from AI workloads. These controls can be used to implement the principle of least privilege, ensuring that only essential communication is permitted.
N-R5	Networking	Use App Gateway or Azure Front Door with WAF in the AI Landing Zone for the application public front-end based on regional or global deployment.	Azure WAF helps protect your AI workloads from common web vulnerabilities, including SQL injections and cross-site scripting attacks. Configure Azure WAF on Application Gateway for workloads that require enhanced security against malicious web traffic. For Azure AI Services, Restrict access to select virtual networks or use private endpoints
N-R6	Networking	Consider APIM as AI Gateway in the AI landing zone with AI Foundry.	The AI Landing Zone should use Azure API Management for load balancing API requests to AI endpoints. Consider using Azure API Management (APIM) as a generative AI gateway within your virtual networks. A generative AI gateway sits between your front-end and the AI endpoints. Application Gateway, WAF policies, and APIM within the virtual network is an established architecture in generative AI solutions. For more information, see AI Hub architecture and Deploy Azure API Management instance to multiple Azure regions. A generative AI gateway allows you to track token usage, throttle token usage, apply circuit breakers, and route to different AI endpoints to control costs. Consider a generative AI gateway for monitoring. A reverse proxy like Azure API Management allows you to implement logging and monitoring that aren't native to the platform. API Management allows you to collect source IPs, input text, and output text. For more information, see Implement logging and monitoring for Azure OpenAI Service language models.. Azure API Management (APIM) can help ensure consistent security across AI workloads. Use its built-in policies for traffic control and security enforcement. Integrate APIM with Microsoft Entra ID to centralize authentication and authorization and ensure only authorized users or applications interact with your AI models. Ensure you configure least privilege access on the reverse proxy’s managed identity. For more information, see AI authentication with APIM
N-R7	Networking	Use Firewall either in the AI Landing Zone or from the platform landing zone (preferred) along with a UDR to the Azure or 3P Firewall.	Azure Firewall enforces security policies for outgoing traffic before it reaches the internet. Use it to control and monitor outgoing traffic and enable SNAT to conceal internal IP addresses by translating private IPs to the firewall's public IP. It ensures secure and identifiable outbound traffic for better monitoring and security.
N-R8	Networking	Use Private DNS Zones either in the AI Landing Zone or from the platform landing zone (preferred) for integrated private endpoints with Private DNS Zones for proper DNS resolution and successful private endpoint functionality.	Private DNS zones centralize and secure DNS management for accessing PaaS services within your AI network. Set up Azure policies that enforce private DNS zones and require private endpoints to ensure secure, internal DNS resolutions. If you don't have central Private DNS Zones, the DNS forwarding doesn't work until you add conditional forwarding manually. For example, see using custom DNS with Azure AI Foundry hubs and Azure Machine Learning workspace. Custom DNS servers manage PaaS connectivity within the network, bypassing public DNS. Configure private DNS zones in Azure to resolve PaaS service names securely and route all traffic through private networking channels.
N-R9	Networking	Restrict Outbound by default in the AI Landing Zone should provide guidance and implementation restricting outbound access by default.	Limiting outbound traffic from your AI model endpoints helps protect sensitive data and maintain the integrity of your AI models. For minimizing data exfiltration risks, restricting outbound traffic to approved services or fully qualified domain names (FQDNs) and maintain a list of trusted sources. You should only allow unrestricted internet outbound traffic if you need access to public machine learning resources but regularly monitor and update your systems. For more information, see Azure AI services, Azure AI Foundry, and Azure Machine Learning.