Terraform Deployment Guide for AI Landing Zones
What This Deployment Does
This Terraform configuration deploys an AI Landing Zone which is a complete AI/ML platform environment ready for development and deployment of AI applications.
Prerequisites
1. Install Required Tools
Terraform (version 1.9 or higher, but less than 2.0):
- Download from terraform.io
- After installation, verify by running: terraform --version
- You should see output like: Terraform v1.9.x
Azure CLI:
- Download from docs.microsoft.com/en-us/cli/azure/install-azure-cli
- After installation, verify by running: az --version
- You should see Azure CLI version information
2. Azure Subscription Requirements
- An active Azure subscription
- Owner or Contributor role on the subscription (required to create resources)
- Sufficient quota for the resources being deployed (VMs, networking, AI services)
3. Register Required Resource Providers
Before deploying, ensure these Azure resource providers are registered. If not, terraform will register the providers during deployment, which may cause delays:
az provider register --namespace Microsoft.CognitiveServices
az provider register --namespace Microsoft.MachineLearningServices
az provider register --namespace Microsoft.Storage
az provider register --namespace Microsoft.KeyVault
az provider register --namespace Microsoft.Network
az provider register --namespace Microsoft.DocumentDB
az provider register --namespace Microsoft.Search
az provider register --namespace Microsoft.ContainerRegistry
az provider register --namespace Microsoft.App
az provider register --namespace Microsoft.Compute
az feature register --name EncryptionAtHost --namespace Microsoft.Compute
Note: Provider registration can take 5-10 minutes. Check status with:
az provider show --namespace Microsoft.CognitiveServices --query "registrationState"
Step-by-Step Deployment Instructions
Step 1: Authenticate to Azure
There are two methods to authenticate: Interactive Login (recommended for personal use) or Service Principal (recommended for automation/CI/CD).
Option A: Interactive Login (Easiest for Getting Started)
Open PowerShell and log in to Azure:
az login
What happens: - A browser window will open - Sign in with your Azure account - The CLI will display your available subscriptions
Set the correct subscription (if you have multiple):
# List all subscriptions
az account list --output table
# Set the subscription you want to use
az account set --subscription "<YOUR_SUBSCRIPTION_ID_OR_NAME>"
# Verify the correct subscription is selected
az account show --output table
Option B: Service Principal Authentication (For Automation)
When to use Service Principals: - Running Terraform in CI/CD pipelines (Azure DevOps, GitHub Actions) - Automated deployments without interactive login - Shared environments where multiple people deploy - When you need specific permission boundaries
Step 1: Get Your Subscription ID
az login # Log in first to create the service principal
az account show --query id --output tsv
Step 2: Create the Service Principal
# Replace <SUBSCRIPTION_ID> with your actual subscription ID
# Replace <SP_NAME> with a descriptive name (e.g., "terraform-ai-landing-zone-sp")
# The below command creates a Service Principal and assigns contributor role to the subscription
az ad sp create-for-rbac --name "<SP_NAME>" --role="Contributor" --scopes="/subscriptions/<SUBSCRIPTION_ID>"
Example output:
{
"appId": "12345678-1234-1234-1234-123456789abc",
"displayName": "terraform-ai-landing-zone-sp",
"password": "abcdefgh-1234-5678-90ab-cdefghijklmn",
"tenant": "87654321-4321-4321-4321-210987654321"
}
Save these values securely! You won't be able to see the password again.
Add role assignment to Service Principal
# To grant Key Vault permissions, the Service Principal must have either "Owner" or "User Access Administrator".
# Following least privilege principle, assign "User Access Administrator".
# Replace <SP_AppId> with the Application ID from the output above.
# Assign the role:
az role assignment create --assignee "<SP_AppID>" --role="User Access Administrator" --scope="/subscriptions/<SUBSCRIPTION_ID>"
In PowerShell, set these environment variables (they must be named exactly as shown):
# Set environment variables for the current session
$env:ARM_CLIENT_ID="<appId from output>"
$env:ARM_CLIENT_SECRET="<password from output>"
$env:ARM_TENANT_ID="<tenant from output>"
$env:ARM_SUBSCRIPTION_ID="<your subscription ID>"
$env:ARM_USE_AZUREAD=true
# Verify they're set
echo "Client ID: $env:ARM_CLIENT_ID"
echo "Tenant ID: $env:ARM_TENANT_ID"
echo "Subscription ID: $env:ARM_SUBSCRIPTION_ID"
echo "Client Secret is set: $($env:ARM_CLIENT_SECRET -ne $null)"
echo "Use Azure AD: $env:ARM_USE_AZUREAD"
To make these permanent (persist across PowerShell sessions):
# Set user environment variables (permanent)
[System.Environment]::SetEnvironmentVariable('ARM_CLIENT_ID', '<appId>', 'User')
[System.Environment]::SetEnvironmentVariable('ARM_CLIENT_SECRET', '<password>', 'User')
[System.Environment]::SetEnvironmentVariable('ARM_TENANT_ID', '<tenant>', 'User')
[System.Environment]::SetEnvironmentVariable('ARM_SUBSCRIPTION_ID', '<subscription_id>', 'User')
[System.Environment]::SetEnvironmentVariable('ARM_USE_AZUREAD', 'true', 'User')
# Refresh your current session to load the variables
$env:ARM_CLIENT_ID = [System.Environment]::GetEnvironmentVariable('ARM_CLIENT_ID', 'User')
$env:ARM_CLIENT_SECRET = [System.Environment]::GetEnvironmentVariable('ARM_CLIENT_SECRET', 'User')
$env:ARM_TENANT_ID = [System.Environment]::GetEnvironmentVariable('ARM_TENANT_ID', 'User')
$env:ARM_SUBSCRIPTION_ID = [System.Environment]::GetEnvironmentVariable('ARM_SUBSCRIPTION_ID', 'User')
$env:ARM_USE_AZUREAD = [System.Environment]::GetEnvironmentVariable('ARM_USE_AZUREAD', 'User')
Step 4: Test the Service Principal
# Login using the service principal
az login --service-principal --username $env:ARM_CLIENT_ID --password $env:ARM_CLIENT_SECRET --tenant $env:ARM_TENANT_ID
# Verify access
az account show
Important Notes:
- When these environment variables are set, Terraform automatically uses them
- You don't need to run az login when using service principal environment variables
- Keep the client secret secure - treat it like a password
- Service principals can have specific permissions (Contributor role in this case)
For CI/CD Pipelines: Store these as secrets in your pipeline: - Azure DevOps: Pipeline > Edit > Variables > Add (select "Keep this value secret") - GitHub Actions: Repository > Settings > Secrets and variables > Actions > New repository secret
Step 2: Navigate to the Terraform Directory
cd C:\Git\AI-Landing-Zones\terraform
Step 3: (Optional) Set Up Remote Backend
What is a backend? Terraform stores information about your infrastructure in a "state file". By default, this is stored locally. For production or team environments, you should store it in Azure Storage.
Option A: Use Local Backend (Recommended for Testing)
No action needed. Terraform will create a terraform.tfstate file locally.
Option B: Use Azure Storage Backend (Recommended for Production)
First, create the backend resources:
# Set variables (customize these)
$RESOURCE_GROUP_NAME="tfstate-rg"
$STORAGE_ACCOUNT_NAME="tfstate$(Get-Random -Minimum 10000 -Maximum 99999)" # Must be globally unique
$CONTAINER_NAME="tfstate"
$LOCATION="eastus"
# Create resource group
az group create --name $RESOURCE_GROUP_NAME --location $LOCATION
# Create storage account
az storage account create --name $STORAGE_ACCOUNT_NAME --resource-group $RESOURCE_GROUP_NAME --location $LOCATION --sku Standard_LRS
# Create blob container
az storage container create --name $CONTAINER_NAME --account-name $STORAGE_ACCOUNT_NAME --auth-mode login
Then create a file named backend.tf in the terraform directory:
terraform {
backend "azurerm" {
resource_group_name = "tfstate-rg"
storage_account_name = "tfstate12345" # Use the name from above
container_name = "tfstate"
key = "ai-landing-zone.tfstate"
}
}
Step 4: Configure Variables (Optional)
The code has default values, but you can customize them by creating a terraform.tfvars file:
# terraform.tfvars
enable_telemetry = true # Set to false if you don't want to send telemetry to Microsoft
Note: The location and resource names are hardcoded in main.tf. If you want to change them, edit main.tf directly:
- Look for location = "swedencentral" (line ~17)
- Look for resource_group_name = "ai-lz-rg-01" (line ~19)
Step 5: Initialize Terraform
This downloads the required providers and modules:
terraform init
Expected output:
Initializing the backend...
Initializing provider plugins...
- Finding hashicorp/azurerm versions matching "~> 4.21"...
- Finding hashicorp/http versions matching "~> 3.4"...
- Installing hashicorp/azurerm v4.21.x...
- Installed hashicorp/azurerm v4.21.x
Terraform has been successfully initialized!
If you see errors:
- Check your internet connection
- Ensure you have Terraform 1.9+ installed
- Try running terraform init -upgrade
Step 6: Review the Deployment Plan
See what Terraform will create without actually creating it:
terraform plan
Tip: To capture the full output for easier review, redirect it to a file:
terraform plan > plan_output.txt
Expected output:
- A long list of resources to be created (50+ resources)
- At the end: Plan: XX to add, 0 to change, 0 to destroy.
Review the output carefully: - Check that the location is correct - Check that resource names make sense - Ensure no unexpected resources are being created
Step 7: Deploy the Infrastructure
Apply the configuration to create the resources:
terraform apply
What happens:
1. Terraform will show the plan again
2. You'll be prompted: Do you want to perform these actions?
3. Type yes and press Enter
Expected duration: 15-30 minutes
You'll see progress output like:
module.test.azurerm_resource_group.this: Creating...
module.test.azurerm_virtual_network.this: Creating...
module.test.azurerm_resource_group.this: Creation complete after 2s
...
Apply complete! Resources: XX added, 0 changed, 0 destroyed.
If deployment fails:
- Read the error message carefully
- Common issues:
- Quota limits (request quota increase in Azure Portal)
- Region capacity (try a different region)
- Permission issues (ensure you have Contributor role)
- Provider not registered (see Prerequisites Step 3)
- Key Vault public access is disabled.
- In production, GitHub workflows run within the network so this is not an issue.
- For local runs, set access to “Allow public access from specific virtual networks and IP addresses” and whitelist workstation’s public IP.
Step 8: Verify the Deployment
Check that resources were created:
# List resource groups
az group list --output table
# List resources in the AI Landing Zone resource group
az resource list --resource-group ai-lz-rg-01 --output table
You should see resources like: - Virtual Network (ai-lz-vnet-01) - AI Hub - Storage accounts - Key Vault - Container Registry - And more...
Step 9: Access Your Resources
Azure Portal:
1. Go to portal.azure.com
2. Navigate to Resource Groups
3. Click on ai-lz-rg-01
4. Explore the deployed resources
Azure AI Foundry: 1. Go to ai.azure.com 2. You should see your AI Hub and Project
Making Changes
If you need to modify the configuration:
- Edit the relevant
.tffiles (usuallymain.tf) - Run
terraform planto preview changes - Run
terraform applyto apply changes
Example: To change the location:
- Open main.tf
- Find location = "swedencentral"
- Change to your desired region (e.g., location = "eastus")
- Run terraform plan and terraform apply
Clean Up / Destroy Resources
WARNING: This will delete ALL resources created by this Terraform configuration!
terraform destroy
What happens:
1. Terraform shows what will be destroyed
2. You'll be prompted: Do you really want to destroy all resources?
3. Type yes and press Enter
4. All resources will be deleted (takes 10-15 minutes)
Note: If destroy fails due to lingering dependencies, wait a few minutes and try again.
Troubleshooting Common Issues
Issue: "Error: building account" or authentication errors
Solution:
# Re-authenticate
az login
az account set --subscription "<YOUR_SUBSCRIPTION_ID>"
Issue: "Error: insufficient quota"
Solution: - Go to Azure Portal → Subscriptions → Usage + quotas - Request a quota increase for the required resource - Or try deploying to a different region
Issue: "Error: Resource provider not registered"
Solution:
# Register the provider mentioned in the error
az provider register --namespace <NAMESPACE_FROM_ERROR>
# Wait a few minutes and try again
Issue: "Error: A resource with the ID already exists"
Solution:
- Another resource with the same name exists
- Either delete the existing resource or change the name in main.tf
Issue: Terraform init fails with module download errors
Solution:
# Clear the cache and re-initialize
Remove-Item -Recurse -Force .terraform
terraform init
Issue: "Error: storage account name is not available"
Solution:
- Storage account names must be globally unique
- The code tries to generate a unique name, but if it fails, edit main.tf to use a different storage account name
Understanding the Terraform Files
terraform.tf: Defines Terraform version and required providers (azurerm, http, random)variables.tf: Defines input variables (onlyenable_telemetryin this case)main.tf: The main configuration that defines all Azure resourcesterraform.tfstate: Created after deployment, stores current infrastructure state (DO NOT manually edit)terraform.tfvars: (Optional) Your custom variable valuesbackend.tf: (Optional) Remote backend configuration for state storage
Next Steps After Deployment
-
Explore AI Foundry:
- Go to ai.azure.com
- Open your project: "Project 1 Display Name"
- Start building AI applications
-
Connect to Resources:
- Use Azure Bastion to securely connect to VMs
- Access Container Registry to push/pull container images
- Use Key Vault to manage secrets
-
Deploy AI Models:
- The deployment includes GPT-4o model deployment
- You can deploy additional models through Azure AI Foundry
-
Set Up CI/CD:
- Integrate with Azure DevOps or GitHub Actions
- Automate model deployment and application updates
Additional Resources
Getting Help
If you encounter issues:
1. Check the error message carefully
2. Review the Troubleshooting section above
3. Check Azure Portal for resource status
4. Review Terraform state: terraform show
5. Open an issue on the GitHub repository
Summary Checklist
- [ ] Install Terraform (1.9+)
- [ ] Install Azure CLI
- [ ] Run
az loginand set correct subscription - [ ] Register required Azure resource providers
- [ ] Navigate to terraform directory
- [ ] (Optional) Create backend storage and
backend.tf - [ ] (Optional) Create
terraform.tfvarswith custom values - [ ] Run
terraform init - [ ] Run
terraform planand review - [ ] Run
terraform applyand typeyes - [ ] Wait 15-30 minutes for deployment
- [ ] Verify resources in Azure Portal
- [ ] Access AI Foundry at ai.azure.com
You're now ready to build AI applications on your new AI Landing Zone!