AI Data Flow and Privacy
This page documents how Container Assist uses AI models, what data from your project is sent to cloud AI services, what data stays local, and the security protections in place.
Architecture: Local Analysis + Cloud AI Generation
Container Assist uses a two-phase architecture:
-
Phase 1 – Local analysis (no network calls): The
containerization-assist-mcp/sdkruns entirely on your machine. It scans your project filesystem to detect languages, frameworks, dependencies, ports, and entry points. No data leaves your machine during this phase. -
Phase 2 – Cloud AI generation: The SDK’s analysis results are formatted into prompts and sent to a VS Code Language Model (via the
vscode.lmAPI) to generate Dockerfiles and Kubernetes manifests. This phase involves cloud AI calls.
What AI Models Are Used
| Setting | Default | Description |
|---|---|---|
aks.containerAssist.modelFamily | gpt-5.2-codex | The model family to use |
aks.containerAssist.modelVendor | copilot | The model vendor (provider) |
- Container Assist uses the VS Code Language Model API (
vscode.lm), which routes requests through GitHub Copilot’s infrastructure. - On launch, you can choose “Use Default Model” or “Select Model…” to pick from any available VS Code language model.
- If the configured default model is not found, the first available model is used as a fallback.
What Data Is Sent to the AI Model
Per-Interaction Overview
Container Assist makes two AI calls per module in your project:
- Dockerfile generation – one AI call
- Kubernetes manifest generation – one AI call
Each call includes a system prompt, a user prompt, and tool definitions. The AI may then invoke tools to read additional files from your project (up to 20 rounds of tool calls per interaction).
System Prompts (Static, Hardcoded)
The system prompts are fixed strings that describe the AI’s role and workflow. They do not contain any of your project data. They instruct the AI to:
- Act as an expert at creating Dockerfiles or Kubernetes manifests
- Use tools (
readProjectFile,listDirectory) to verify project details before generating - Output content in a specific
<content>marker format
User Prompts (Contain Project Data)
The user prompt is built from the local SDK analysis and includes:
For Dockerfile generation:
- Detected programming language (e.g., “typescript”, “python”, “java”)
- Framework names and versions (e.g., “Express v4.18.0”)
- Entry point path (e.g., “src/index.ts”)
- Detected ports (e.g., “3000, 8080”)
- First 15 dependency names (e.g., “express, pg, redis, …”)
- SDK recommendations: build strategy, base image suggestions, security considerations, optimizations
- Existing Dockerfile content (if present), including analysis and enhancement guidance
- Language-specific verification hints (what config files to check)
For Kubernetes manifest generation:
- All of the above, plus:
- Application name (e.g., “my-app”)
- Target Kubernetes namespace
- Full image repository URL (e.g.,
myacr.azurecr.io/my-app)
Tool Calls (AI Reads Your Files)
During generation, the AI can invoke two tools to inspect your project:
readProjectFile
Reads a file from your project. The AI decides which files to read based on the analysis.
- Input: Relative file path, optional line limit
- Output: File content (up to 200 lines)
- Typical files read:
package.json,tsconfig.json,pom.xml,Dockerfile,go.mod, source entry points, configuration files
listDirectory
Lists files and subdirectories in your project.
- Input: Relative directory path, optional max depth
- Output: Tree listing of files and directories (up to 200 entries, max 3 levels deep)
- Excluded from listings:
node_modules,.git,dist,build,target,bin,obj,__pycache__,venv,.next,.nuxt
The AI can make up to 20 rounds of tool calls per interaction. In each round, it may call multiple tools concurrently. After 20 rounds, a final request is sent without tools to force the AI to produce its output.
What Data Is NOT Sent
Blocked Sensitive Files
The following files are blocked from being read by the AI, even if it requests them:
| Pattern | Examples |
|---|---|
.env, .env.local, .env.production, .env.staging | Environment variable files |
*.pem, *.key, *.pfx, *.p12 | TLS/SSL certificates and private keys |
credentials* | Credential files (any extension) |
secret.*, secrets.*, .secrets | Secret configuration files |
id_rsa, id_ed25519 | SSH private keys |
*.secret | Any file with .secret extension |
If the AI requests a blocked file, the tool returns an error and the AI must proceed without that file’s content.
Path Traversal Protection
All file access tools enforce strict path boundaries:
..path segments are rejected (no escaping the project root)- Absolute paths are rejected
- Windows drive paths and UNC paths are rejected
- The resolved path is verified to remain within the workspace root
What Stays Entirely Local
The following data is processed locally and never sent to any AI model:
- Your Azure subscription, cluster, ACR, and namespace selections
- Managed identity details, role assignments, federated credentials
- GitHub repository secrets
- Git history and commit data
- The workflow YAML template (generated from a local template, not AI)
- File write operations (Dockerfile, manifests, workflow files)
Summary: Data Flow by Destination
| Destination | Data |
|---|---|
| Local only (no network) | Full project filesystem scan, Azure resource operations, role assignments, GitHub secrets, workflow template rendering, file writes |
| VS Code Language Model API (cloud) | SDK analysis summaries (language, framework, ports, dependencies, entry point), project file contents requested by AI tools, system prompts |
| Not sent (blocked) | .env files, private keys, certificates, credential files, SSH keys, secret files |