agentops doctor — checks reference¶
This page is the canonical inventory of every check the AgentOps
Doctor runs. Each row tells you the finding id that surfaces in
report.md / the cockpit, the WAF pillar the check belongs to,
the data source it reads, and the detection signal (the rule
itself, in plain English).
Tip: the same catalog is browsable from the terminal:
agentops doctor explain # paged manual: sources + flow agentops doctor explain --open # browser-friendly copy for printing agentops doctor explain -f markdown -o doctor.mdUse
agentops doctor --helpfor terse syntax and options. Useagentops doctor explainfor the Linux-style command manual, including sources, execution flow, examples, and the check catalog. Add--openwhen you want a printable browser view, or--format markdown --outwhen you want a shareable document. Use this page for the deeper "how each rule detects the signal" reference.
Two conventions across the table:
- Mechanism =
programmatic— the rule is deterministic: an SDK call, a KQL query, a regex, a timestamp comparison, or a statistic. Fast, reproducible, free to run, and always on. - Mechanism =
llm-judged— the rule asks a judge model to look at semantic signals a regex can't see (jailbreak surface, evaluator coverage, etc.). Gated bychecks.llm_assist.enabled, capped atwarningseverity, never affects CI exit codes.
Data sources¶
The Doctor reaches Azure through five sources, all configured in
.agentops/agent.yaml:
| Source | Reads |
|---|---|
workspace_files |
.agentops/, agentops.yaml, .github/workflows/, .azuredevops/pipelines/, AI Landing Zone signals, CHANGELOG.md, spec-kit / AGENTS.md |
results_history |
Local .agentops/results/*/results.json first; Foundry cloud evaluation runs as fallback when local history is missing or too short |
azure_monitor |
Application Insights / Log Analytics via REST (KQL) |
foundry_control |
Foundry project / agents / evaluation rules via azure-ai-projects |
azure_resources |
Cognitive Services account properties via azure-mgmt-cognitiveservices; inferred from explicit config, AZD .azure/<env>/.env when present, or Foundry endpoint/account matching |
The LLM-judged rules additionally use the Foundry project's OpenAI client (auto-discovered) as the judge model.
Azure sources fail open. When Doctor cannot authenticate, infer an AZD environment, match a single Azure AI account, or read a resource, it records a diagnostic with the reason and suggested setup step instead of stopping the whole run.
Catalogue¶
🛡️ Security¶
| Finding id | Severity | Source | Mechanism | Detection |
|---|---|---|---|---|
waf.security.local_auth_disabled |
warning | azure_resources |
programmatic | account.disable_local_auth == true |
waf.security.managed_identity |
warning | azure_resources |
programmatic | account.identity.type in {SystemAssigned, UserAssigned} |
waf.security.public_network_access |
warning | azure_resources |
programmatic | account.publicNetworkAccess == Disabled (or private endpoint present) |
waf.security.diagnostic_settings |
warning | azure_resources |
programmatic | account has ≥1 diagnostic setting with a workspace_id |
safety.runtime.content_filter |
warning | azure_monitor |
programmatic | KQL hits on gen_ai.response.finish_reasons contains content_filter |
responsible_ai.llm.prompt_jailbreak_surface |
info / warning | foundry_control |
llm-judged | judge model scans system prompt for override-phrasing, embedded secrets, unbounded role-play |
⚙️ Operational Excellence¶
| Finding id | Severity | Source | Mechanism | Detection |
|---|---|---|---|---|
opex.unpinned_agent |
warning | workspace_files |
programmatic | agentops.yaml has agent: without a :version suffix |
opex.no_thresholds |
warning | workspace_files |
programmatic | agentops.yaml has no thresholds: block |
opex.no_pr_gate |
warning | workspace_files |
programmatic | .github/workflows/agentops-pr.yml missing |
opex.no_deploy_workflow |
warning | workspace_files |
programmatic | no .github/workflows/agentops-deploy-*.yml |
opex.results_not_gitignored |
warning | workspace_files |
programmatic | .agentops/results/ not in any reachable .gitignore |
opex.unversioned_dataset |
warning | workspace_files |
programmatic | dataset YAMLs lack a top-level version: |
opex.unversioned_bundle |
warning | workspace_files |
programmatic | bundle YAMLs lack a top-level version: |
opex.results_dir_bloat |
warning | workspace_files |
programmatic | .agentops/results/ holds > 50 run folders |
opex.workflow_concurrency_lock |
warning | workspace_files |
programmatic | AgentOps workflows missing a top-level concurrency: block |
opex.workflow_action_sha_pinning |
warning | workspace_files |
programmatic | uses: pinned to tag instead of 40-char SHA |
opex.ailz_readiness |
info | workspace_files |
programmatic | canonical AI Landing Zone signals detected; reports deployment readiness dimensions |
opex.ailz_gaps |
warning | workspace_files |
programmatic | AI Landing Zone signals detected but preflight, eval config, azd workflow, or private-runner path needs attention |
opex.release.no_eval_evidence |
warning | workspace_files + results_history |
programmatic | AgentOps workspace detected but no completed eval run is available for release evidence |
opex.release.latest_eval_failed |
critical | results_history |
programmatic | latest eval run exists but did not pass |
opex.release.no_baseline |
warning | workspace_files + results_history |
programmatic | eval evidence exists but no baseline/comparison is available for regression decisions |
opex.release.no_trace_regression_dataset |
info | workspace_files + results_history |
programmatic | no .agentops/data/trace-regression-manifest.json exists yet |
opex.release.no_continuous_eval |
warning | foundry_control |
programmatic | Foundry control plane is reachable but no enabled continuous evaluation rule was detected |
opex.no_foundry_control_configured |
warning | foundry_control |
programmatic | Foundry control plane unreachable |
opex.stale_evaluation |
warning / critical | results_history |
programmatic | latest run older than stale_after_days (critical at 2×) |
opex.flaky_metric.<metric> |
warning | results_history |
programmatic | coefficient of variation across last N runs > flaky_cv_threshold |
opex.no_token_telemetry |
warning | azure_monitor |
programmatic | request_count > 0 but gen_ai.usage.input_tokens + output_tokens == 0 |
opex.max_tokens_undefined |
warning | workspace_files |
programmatic | no max_tokens: or max_completion_tokens: declared in any agentops.yaml / bundle YAML that configures a model |
opex.llm.bundle_coverage |
info / warning | workspace_files |
llm-judged | judge compares bundle YAML against agent description and flags missing built-ins |
opex.spec_conformance.spec_missing |
warning | workspace_files |
programmatic | spec-driven setup detected (.specify/, AGENTS.md, or Copilot instructions) but no readable spec body, so Doctor cannot verify bundles / datasets / tasks against intended agent behavior |
opex.spec_conformance.tasks_stale |
warning | workspace_files |
programmatic | unchecked task-list items in the spec have remained open past stale_after_days, which suggests the implementation plan may be stale or the task list was not maintained |
opex.spec_conformance.tasks_orphaned |
warning | workspace_files |
programmatic | checked task references a path missing from workspace |
opex.spec_conformance.evaluator_drift |
warning | workspace_files |
programmatic | evaluator mentioned in spec absent from agentops.yaml |
opex.spec_conformance.dataset_drift |
warning | workspace_files |
programmatic | dataset mentioned in spec absent from .agentops/data/ |
opex.spec_conformance.agent_drift |
warning | workspace_files |
programmatic | spec agent_id doesn't match agentops.yaml |
opex.spec_conformance.llm.implementation_gap |
info / warning | workspace_files |
llm-judged | judge compares spec capabilities to workspace fingerprint |
🛟 Reliability¶
| Finding id | Severity | Source | Mechanism | Detection |
|---|---|---|---|---|
errors.production_rate |
warning / critical | azure_monitor |
programmatic | KQL errors / requests > rate_threshold (critical at 2×) |
errors.foundry_runs |
warning | foundry_control |
programmatic | foundry.failure_rate > rate_threshold |
errors.no_runtime_telemetry |
warning | azure_monitor |
programmatic | monitor.status == ok AND request_count == 0 over lookback |
errors.rate_limit_pressure |
warning / critical | azure_monitor |
programmatic | KQL counts dependency calls with resultCode == 429; escalates at 2× the floor |
⚡ Performance¶
| Finding id | Severity | Source | Mechanism | Detection |
|---|---|---|---|---|
latency |
warning / critical | azure_monitor + results_history |
programmatic | p95_seconds > p95_threshold_seconds (critical at 2×) |
🎯 Quality¶
| Finding id | Severity | Source | Mechanism | Detection |
|---|---|---|---|---|
regression.<metric> |
warning / critical | results_history |
programmatic | latest_metric - baseline_metric > threshold_drop per metric |
🪞 Responsible AI¶
| Finding id | Severity | Source | Mechanism | Detection |
|---|---|---|---|---|
safety |
warning | results_history |
programmatic | row-level content-safety metric at or above severity_floor in the latest eval |
safety.config.continuous_eval_missing |
warning | foundry_control |
programmatic | foundry.evaluation_rules empty while agents exist |
safety.config.continuous_eval_disabled |
warning | foundry_control |
programmatic | any evaluation_rule.enabled == false |
responsible_ai.llm.prompt_transparency |
info / warning | foundry_control |
llm-judged | judge scans agent instructions for AI-disclosure / source-citation / role-scope |
responsible_ai.llm.prompt_safety_guardrails |
info / warning | foundry_control |
llm-judged | judge looks for refusal patterns across the four harm categories |
responsible_ai.llm.dataset_pii_risk |
info / warning | workspace_files |
llm-judged | judge scans .agentops/data/*.jsonl sample for PII |
responsible_ai.llm.dataset_bias_signals |
info / warning | workspace_files |
llm-judged | judge scores dataset sample for demographic / role / domain / tone skew |
Configuring the checks¶
Every threshold lives in .agentops/agent.yaml under checks.<rule>.
For example, to make the regression check more sensitive:
checks:
regression:
threshold_drop: 0.05 # 5 percentage points instead of the default 10
min_runs: 5 # need at least 5 runs for a baseline
To suppress a specific rule entirely, add it to checks.<rule>.skip
(where supported) or use the CLI flag:
agentops doctor --categories quality,operational_excellence
agentops doctor --exclude-rules waf.security.diagnostic_settings
Roadmap¶
The following items are on the AI Landing Zones Checklist but do not ship a rule today. They appear here so users know what's coming and can prioritize:
| AI.X | Pillar | Plan |
|---|---|---|
| AI.10 | Reliability | TPM/RPM quota adequacy via azure-mgmt-monitor metrics (OpenAI/Quota/UtilizationPercentage) |
| AI.155 | Reliability | Provisioned-throughput utilization via azureOpenAIProvisionedManagedUtilization metric |
| AI.158 | Reliability | AI Search replica count via azure-mgmt-search (searchService.replicaCount >= 2) |
| AI.140 | Performance | Token-consumption benchmark vs configured target |
| AI.25 | Cost | Cost-per-model tracking via Azure Cost Management API |
Each requires a new Azure SDK dependency; we will land them incrementally as workloads adopt them.