NL2SQL Quick Start Guide

Get public access natural language querying of your Azure SQL Database working in 30 minutes using automated blob storage ingestion.

This quickstart enables a public access solution which is for testing purposes only!

Prerequisites

You must have:

GPT-RAG solution deployed (azd provision and azd deploy completed)
Azure subscription with permissions to create SQL resources
Access to Azure Portal

What You'll Accomplish

By the end of this guide:

Create Azure SQL Database with AdventureWorksLT sample data
Configure networking and firewall rules
Store database credentials securely in Key Vault
Register your SQL database as a datasource
Upload table metadata and example queries to blob storage
Enable automated ingestion with CRON scheduling
Enable NL2SQL strategy
Ask questions in natural language and get SQL results

Step 1: Create Azure SQL Database (5-10 min)

A. Create SQL Server

Azure Portal → Create a resource → SQL Database

Basics tab:
- Subscription: Your subscription
- Resource group: Same as your GPT-RAG deployment (or create new)
- Database name: adventureworks-demo
- Server: Click Create new
Create SQL Database Server:
- Server name: sql-gptrag-demo-<unique> (must be globally unique)
- Location: Same region as GPT-RAG (optional - any region works, but same region reduces latency)
- Authentication method: Use SQL authentication (simpler for this quickstart; Entra ID authentication is also supported)
- Server admin login: sqladmin
- Password: Create a strong password (save this!)
- Click OK
Compute + storage:
- Click Configure database
- Select Basic (cheapest for testing - $5/month)
- Click Apply
Backup storage redundancy:
- Select Locally-redundant backup storage (cheapest)

B. Configure Networking

Networking tab:
- Connectivity method: Public endpoint
- Firewall rules:
- Allow Azure services and resources to access this server - YES (CRITICAL!)
- Add current client IP address - YES (for your testing)
- Connection policy: Default
- Encrypted connections: TLS 1.2 (default)

C. Add Sample Data

Additional settings tab:
- Use existing data: Sample (AdventureWorksLT)
- Collation: Default
- Enable Microsoft Defender: Not needed for demo
Click Review + create → Create

⏳ Wait 3-5 minutes for deployment to complete.

D. Verify Firewall Configuration (Post-Deployment)

After deployment completes:

⚠️ Important: Go to the SQL Server resource (not the database):

Azure Portal → SQL servers → sql-gptrag-demo-<unique> → Security → Networking

(Note: SQL databases → adventureworks-demo - that's the wrong place!)

Verify these settings:

Public network access: Selected networks
Exceptions: ☑️ Allow Azure services and resources to access this server (checked)

The "Allow Azure services" checkbox creates a special firewall rule (0.0.0.0 - 0.0.0.0) that permits any Azure service in your subscription to connect.

E. Test Connection

Using Azure Portal Query Editor:

Go to your SQL Database → Query editor
Login with SQL authentication (sqladmin / your password)

Run test query:

SELECT TOP 5 ProductID, Name, ListPrice 
FROM SalesLT.Product 
ORDER BY ListPrice DESC

If you see results, your database is ready!

F. Gather Connection Info

Save these details (you'll need them later):

Server: <your-sql-server-name>.database.windows.net
Database: adventureworks-demo
Username: sqladmin
Password: <your-password>

Step 2: Find Your GPT-RAG Resources (3 min)

Locate these in your resource group (they have your deployment suffix):

Key Vault: kv-<suffix>
Cosmos DB: cosmos-<suffix>
AI Search: srch-<suffix>
App Config: appcs-<suffix>
Storage Account: st<suffix> (note: no dash in storage account names)

Quick way to find suffix:

# List resource groups
az group list --query "[?contains(name,'gpt-rag')].name" -o table

# List resources in your group
az resource list -g <your-rg> --query "[].name" -o table

Step 3: Store Password in Key Vault (3 min)

CRITICAL: Secret name MUST be {datasource-id}-secret

# Using the password you created in Step 1
# If your datasource id will be "adventureworks"
# Then secret name must be "adventureworks-secret"

az keyvault secret set `
  --vault-name "kv-<your-suffix>" `
  --name "adventureworks-secret" `
  --value "<your-sql-password-from-step-1>"

Verify it worked:

az keyvault secret show `
  --vault-name "kv-<your-suffix>" `
  --name "adventureworks-secret" `
  --query "value" -o tsv

Step 4: Grant Container App Access to Key Vault (2 min)

# Get orchestrator's managed identity
$principalId = az containerapp show `
  --name "ca-<your-suffix>-orchestrator" `
  --resource-group <your-rg> `
  --query "identity.principalId" -o tsv

# Grant access
az role assignment create `
  --assignee $principalId `
  --role "Key Vault Secrets User" `
  --scope "/subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.KeyVault/vaults/kv-<your-suffix>"

Step 5: Register Database in Cosmos DB (3 min)

Azure Portal → Cosmos DB → cosmos-<suffix> → Data Explorer → ragdata database → datasources container → New Item

Paste this JSON, replacing <your-sql-server-name> with the server name you created in Step 1 (e.g., sql-gptrag-demo-xyz123):

{
    "id": "adventureworks",
    "type": "sql_database",
    "description": "AdventureWorksLT sample database with products and customers",
    "server": "<your-sql-server-name>.database.windows.net",
    "database": "adventureworks-demo",
    "uid": "sqladmin",
    "metadata": {
        "created_date": "2025-11-17"
    }
}

💡 How to find your server name:

Azure Portal → SQL servers → Look for the server you just created
Copy the name (e.g., sql-gptrag-demo-ragpace)
Add .database.windows.net to the end

⚠️ CRITICAL RULES:

Use uid (NOT username)
DO NOT include password field
DO NOT include connection_info field
The id must match Key Vault secret prefix (adventureworks → adventureworks-secret)

Click Save.

Step 6: Create Metadata JSON Files (5 min)

What you'll do in this step:

Create JSON files on your local machine (in your working directory)
These files describe your database tables and example queries
In Step 7, you'll upload these files to Azure Blob Storage
The automated ingestion system will then index them into AI Search

A. Create Local Folder Structure

On your local machine, create folders to organize the metadata files:

mkdir blob-upload
mkdir blob-upload\queries
mkdir blob-upload\tables

B. Create Example Query Files

Why example queries?

They help the AI understand your database patterns and generate better SQL. The system uses these as few-shot examples when translating natural language to SQL.

📝 Note: We're only adding 3 queries here for quick setup. Ideally, add 10-20 diverse examples covering:

Simple queries (counts, filters)
Complex joins across multiple tables
Aggregations (SUM, AVG, GROUP BY)
Date/time filtering
Common business questions your users ask

More examples = better SQL generation accuracy!

Create these files on your local machine:

Create blob-upload\queries\how_many_products.json:

{
    "question": "How many products are in the database?",
    "query": "SELECT COUNT(*) as product_count FROM SalesLT.Product",
    "reasoning": "Simple count of all products",
    "datasource": "adventureworks"
}

Create blob-upload\queries\top_expensive_products.json:

{
    "question": "Show me the top 5 most expensive products",
    "query": "SELECT TOP 5 ProductID, Name, ListPrice FROM SalesLT.Product ORDER BY ListPrice DESC",
    "reasoning": "Get highest priced products",
    "datasource": "adventureworks"
}

Create blob-upload\queries\product_categories.json:

{
    "question": "What product categories exist?",
    "query": "SELECT DISTINCT Name FROM SalesLT.ProductCategory ORDER BY Name",
    "reasoning": "List all unique categories",
    "datasource": "adventureworks"
}

C. Create Table Metadata Files

Why table metadata?

The system uses AI Search for semantic table discovery instead of live database introspection. When users ask questions, the system:

Searches your table metadata using embeddings (vector search)
Finds the most relevant tables based on descriptions
Then calls GetSchemaInfo to retrieve detailed column information

Benefits of this approach:

Fast semantic search - Find relevant tables using natural language ("revenue data" matches "SalesOrderHeader")
Control what's exposed - Only include tables relevant to end users (exclude admin/audit tables)
Add business context - Descriptions help the AI understand table purpose beyond raw schema
Avoid token limits - Don't send 500 table schemas to GPT-4 every query

⚠️ Schema updates:

If you add/drop columns or tables later, create new JSON files and upload them
The automated ingestion will detect changes and re-index automatically

Note

Column descriptions are not specified here because the column names are sufficiently descriptive for the LLM

Create these files on your local machine:

Create blob-upload\tables\saleslt_product.json:

{
    "table": "SalesLT.Product",
    "description": "Product catalog with names, prices, and descriptions",
    "datasource": "adventureworks"
}

Create blob-upload\tables\saleslt_productcategory.json:

{
    "table": "SalesLT.ProductCategory",
    "description": "Product categories for organizing products",
    "datasource": "adventureworks"
}

Create blob-upload\tables\saleslt_customer.json:

{
    "table": "SalesLT.Customer",
    "description": "Customer information including names and contact details",
    "datasource": "adventureworks"
}

Step 7: Upload Files to Blob Storage (3 min)

What this step does: - Uploads the local JSON files you created in Step 6 to Azure Blob Storage - Files go into the nl2sql container (this container already exists from your deployment) - The folder structure (queries/ and tables/) will be created automatically during upload

Upload command:

# Upload all files (queries and tables)
az storage blob upload-batch `
  --account-name "st<your-suffix>" `
  --destination nl2sql `
  --source blob-upload `
  --auth-mode login

Verify uploads:

# List all blobs in nl2sql container
az storage blob list `
  --account-name "st<your-suffix>" `
  --container-name nl2sql `
  --auth-mode login `
  --query "[].name" -o table

You should see:

queries/how_many_products.json
queries/top_expensive_products.json
queries/product_categories.json
tables/saleslt_product.json
tables/saleslt_productcategory.json
tables/saleslt_customer.json

Step 8: Enable Automated Ingestion (3 min)

Azure Portal → App Configuration → appcs-<suffix> → Configuration explorer

Click + Create to add a new key-value:

Key: CRON_RUN_NL2SQL_INDEX
Value: */15 * * * *
Label: gpt-rag-ingestion
Content type: text/plain

Click Apply.

What this does:

Runs the NL2SQL indexer job every 15 minutes
Scans the nl2sql container for new/changed files
Automatically indexes them into AI Search
Generates embeddings for semantic search
Skips unchanged files (smart change detection)

Alternative schedules:

*/5 * * * * - Every 5 minutes (faster updates)
*/2 * * * * - Every 2 minutes (testing/development only)
0 * * * * - Every hour on the hour
0 0 * * * - Once daily at midnight

Cost considerations:

CRON schedules are free - they're just timers, not separate compute
The Container App runs 24/7 regardless of schedule (~$30-50/month)
Each run generates embeddings via Azure OpenAI (~$0.001 per run)
Smart change detection skips unchanged files (saves costs)
Recommended for production: */15 * * * * balances responsiveness with costs (~$3-5/month in embeddings)
For testing: Use */2 * * * * to see results faster, then change to */15 * * * *

Step 9: Enable NL2SQL Strategy (2 min)

Azure Portal → App Configuration → appcs-<suffix> → Configuration explorer

Search for key: AGENT_STRATEGY
Click the key → Click Edit
Change value to: nl2sql
Click Apply

Step 10: Wait for Initial Ingestion (2-3 min)

The data ingestion Container App runs the indexer job:

On startup (happens once when container starts)
Every 15 minutes (based on CRON schedule)

Option 1: Wait for next CRON run (up to 15 minutes, or 2 minutes if you used */2 * * * * for testing)

How it works:

The Container App runs the indexer on startup (immediate first run)
Then runs on the CRON schedule you configured
CRON runs at the next matching time (e.g., if schedule is */15 * * * *, next run is at :00, :15, :30, or :45)

Option 2: Force immediate ingestion by restarting the container:

az containerapp revision restart `
  --name "ca-<your-suffix>-dataingest" `
  --resource-group <your-rg> `
  --revision $(az containerapp revision list --name "ca-<your-suffix>-dataingest" --resource-group <your-rg> --query "[0].name" -o tsv)

Verify indexing completed:

Check the logs in blob storage:

# List ingestion run logs
az storage blob list `
  --account-name "st<your-suffix>" `
  --container-name jobs `
  --prefix "nl2sql-indexer/runs/" `
  --auth-mode login `
  --query "[].{name:name, modified:properties.lastModified}" -o table

Download the most recent log:

# Get the latest run log
$latestLog = az storage blob list `
  --account-name "st<your-suffix>" `
  --container-name jobs `
  --prefix "nl2sql-indexer/runs/" `
  --auth-mode login `
  --query "[-1].name" -o tsv

# Download and view it
az storage blob download `
  --account-name "st<your-suffix>" `
  --container-name jobs `
  --name $latestLog `
  --file run-log.json `
  --auth-mode login

Get-Content run-log.json | ConvertFrom-Json | ConvertTo-Json -Depth 10

Look for:

"success": 3 (for queries)
"success": 3 (for tables)
"failed": 0
"skipped": 0 (first run - nothing skipped yet)
"vectorsGenerated": 6 (embeddings created)

On subsequent runs:

Files with no changes will show "skipped": 6 and "candidates": 0
Only new or modified files will be re-indexed
This smart change detection saves costs and time

Step 11: Test It! (2 min)

Navigate to your UI: https://ca-<suffix>-frontend.livelyglacier-<random>.eastus2.azurecontainerapps.io

Try these questions:

"How many products are in the database?"
"Show me the top 5 most expensive products"
"What product categories exist?"

Expected response:

Natural language answer with data
Shows SQL query that was executed
Cites the datasource

Congratulations! You've set up an automated NL2SQL ingestion pipeline.

What You Just Enabled 🪄

🧠 Automated Metadata Management

Change detection - Only re-indexes modified files (checks ETag and lastModified)

Smart scheduling - Runs every 15 minutes, can be adjusted

Startup sync - Runs once on container startup for immediate availability

Detailed logging - Per-file and per-run logs in blob storage

Scalable - Handles hundreds of tables and queries efficiently

🧩 Advanced Query Capabilities (Already working!)

Complex JOINs - "Show me orders with customer names and product details" automatically generates multi-table joins

Aggregations - "What's the average order value by product category?" generates GROUP BY with AVG/SUM/COUNT

Date filtering - "Show orders from last 30 days" uses DATEADD and date functions

Subqueries - Handles nested queries when needed for complex business logic

Pattern matching - "Find customers whose email contains 'adventure'" uses LIKE operators

🔐 Enabling Zero Trust Architecture

The guide used SQL authentication for simplicity, but production deployments support:

Azure AD authentication - No passwords, just managed identities

Private Endpoints - Database never exposed to internet

VNet integration - Container Apps and SQL in same private network