Kusto-to-Metrics Integration
Problem Statement
ADX-Mon currently provides two distinct capabilities:
1. SummaryRules - Execute KQL queries on a schedule and ingest results into ADX tables (api/v1/summaryrule_types.go)
2. OTLP Metrics Exporters - Export metrics to external observability systems (collector/export/metric_otlp.go)
However, there's no direct mechanism to execute KQL queries and export the results as metrics to observability platforms. Organizations often need to: - Execute KQL queries on ADX data and export results as standardized metrics - Export these metrics to external systems (Prometheus, DataDog, etc.) at regular intervals - Create derived metrics from complex KQL aggregations for dashboards and alerting
Currently, users must either: - Use SummaryRules to materialize data in ADX tables, then build custom exporters to transform and export - Implement entirely separate infrastructure outside of ADX-Mon
This leads to: - Duplicated infrastructure for metric transformation and export - Inconsistent metric schemas across different teams - Additional storage costs for intermediate table materialization - Complex multi-step pipelines that are difficult to maintain
Solution Overview
We propose implementing a new adxexporter component that processes MetricsExporter CRDs to execute KQL queries and export results as metrics. This provides a streamlined, declarative way to create KQL-to-metrics pipelines with two complementary output modes:
- Prometheus Scraping Mode (Phase 1): Execute queries and expose results as Prometheus metrics on
/metricsendpoint for Collector to scrape - Direct OTLP Push Mode (Phase 2): Execute queries and push results directly to OTLP endpoints with backlog/retry capabilities
Key Requirements
- Direct KQL Execution: Execute KQL queries directly without requiring intermediate table storage
- SummaryRule-like Behavior: Share time management, scheduling, and criteria-based execution patterns
- Criteria-Based Deployment: Support cluster-label based filtering for secure, distributed execution
- Resilient Operation: Provide backlog and retry capabilities for reliable metric delivery
Architecture Overview
The solution introduces a new adxexporter component that operates as a Kubernetes controller, watching MetricsExporter CRDs and executing their configured KQL queries on schedule.
Core Components
adxexporter- New standalone component (cmd/adxexporter) that:- Watches
MetricsExporterCRDs via Kubernetes API - Executes KQL queries against ADX on specified intervals
- Transforms results to metrics format
-
Outputs metrics via Prometheus scraping or direct OTLP push
-
MetricsExporterCRD - Kubernetes custom resource defining: - KQL query to execute
- Transform configuration for metric conversion
- Execution criteria and scheduling
-
Output mode configuration
-
Integration with Existing Components:
- Collector discovers and scrapes
adxexporterinstances via pod annotations - Operator manages
MetricsExporterCRD lifecycle (optional integration)
Deployment Architecture
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Collector │ │ adxexporter │ │ ADX │
│ │ │ │ │ Clusters │
│ ┌─────────────┐ │ │ ┌─────────────┐ │ │ │
│ │Pod Discovery│◄├────┤ │/metrics │ │ │ │
│ │& Scraping │ │ │ │endpoint │ │ │ │
│ └─────────────┘ │ │ └─────────────┘ │ │ │
│ ┌─────────────┐ │ │ ┌─────────────┐ │ │ │
│ │OTLP Export │ │ │ │KQL Query │◄├────┤ │
│ │Targets │ │ │ │Execution │ │ │ │
│ └─────────────┘ │ │ └─────────────┘ │ │ │
└─────────────────┘ │ ┌─────────────┐ │ └─────────────────┘
│ │CRD Watch & │ │
┌───────────────┤ │Reconciliation│ │
│ │ └─────────────┘ │
│ └─────────────────┘
│
▼
┌─────────────────┐
│ Kubernetes │
│ API Server │
│ │
│ MetricsExporter │
│ CRDs │
└─────────────────┘
Output Modes
Primary: Direct OTLP Push Mode (Implemented)
adxexporterpushes metrics directly to OTLP endpoints usingPromToOtlpExporter- Leverages
pkg/prompbfor memory-efficient metric serialization with object pooling - Preserves actual timestamps from KQL query results
- Requires
--otlp-endpointCLI flag - Reuses battle-tested OTLP export infrastructure from the Collector
Alternative: Prometheus Scraping Mode (Future Enhancement)
adxexportercould expose/metricsendpoint with OpenTelemetry metrics library- Collector discovers via pod annotations:
adx-mon/scrape: "true" - Deprioritized in favor of direct OTLP push for better timestamp fidelity
Design Approach
adxexporter Component
The adxexporter is a new standalone Kubernetes component with its own binary at cmd/adxexporter. It functions as a Kubernetes controller that watches MetricsExporter CRDs and executes their KQL queries on schedule.
Command Line Interface
adxexporter \
--cluster-labels="region=eastus,environment=production,team=platform" \
--kusto-endpoint="MetricsDB=https://cluster.kusto.windows.net" \
--otlp-endpoint="http://otel-collector:4318/v1/metrics"
Parameters:
-
--cluster-labels: Comma-separated key=value pairs defining this instance's cluster identity. Used for criteria-based filtering of MetricsExporter CRDs. This follows the same pattern as the Ingestor component documented in ADX-Mon Configuration. -
--kusto-endpoint: ADX endpoint in format<database>=<endpoint>. Multiple endpoints can be specified for multi-database support. -
--otlp-endpoint: (Required) OTLP HTTP endpoint URL for pushing metrics. The adxexporter converts KQL results toprompb.WriteRequestformat and pushes them to this endpoint. -
--health-probe-port: Port for health probe endpoints (default: 8081). Exposes/healthzand/readyzendpoints.
Criteria-Based Execution
Similar to the Ingestor component, adxexporter uses cluster labels to determine which MetricsExporter CRDs it should process. This enables:
- Security Boundaries: Only process MetricsExporters appropriate for this cluster's data classification
- Geographic Distribution: Deploy region-specific adxexporter instances
- Team Isolation: Separate processing by team ownership
- Resource Optimization: Distribute load across appropriate instances
Example Criteria Matching:
# MetricsExporter CRD
spec:
criteria:
region: ["eastus", "westus"]
environment: ["production"]
# adxexporter instance
--cluster-labels="region=eastus,environment=production,team=sre"
# ✅ Matches: region=eastus AND environment=production
MetricsExporter CRD
The MetricsExporter CRD defines KQL queries and their transformation to metrics format. It shares core patterns with SummaryRule but targets metrics output instead of ADX table ingestion.
// MetricsExporterSpec defines the desired state of MetricsExporter
type MetricsExporterSpec struct {
// Database is the name of the database to query
Database string `json:"database"`
// Body is the KQL query to execute
Body string `json:"body"`
// Interval defines how often to execute the query and refresh metrics
Interval metav1.Duration `json:"interval"`
// Transform defines how to convert query results to metrics
Transform TransformConfig `json:"transform"`
// Criteria for cluster-based execution selection (same pattern as SummaryRule)
Criteria map[string][]string `json:"criteria,omitempty"`
}
type TransformConfig struct {
// MetricNameColumn specifies which column contains the metric name
MetricNameColumn string `json:"metricNameColumn,omitempty"`
// ValueColumn specifies which column contains the metric value
ValueColumn string `json:"valueColumn"`
// TimestampColumn specifies which column contains the timestamp
TimestampColumn string `json:"timestampColumn"`
// LabelColumns specifies columns to use as metric labels
LabelColumns []string `json:"labelColumns,omitempty"`
// DefaultMetricName provides a fallback if MetricNameColumn is not specified
DefaultMetricName string `json:"defaultMetricName,omitempty"`
}
Key Design Principles
-
Standalone Operation:
adxexporteroperates independently from existing ingestor/operator infrastructure, providing clear separation of concerns and deployment flexibility. -
Dual Output Strategy:
- Phase 1 (Prometheus): Fast implementation using existing Collector scraping capabilities
-
Phase 2 (Direct OTLP): Enhanced resilience with backlog and retry mechanisms
-
Criteria-Based Filtering: Leverages the same cluster-label approach as Ingestor for secure, distributed execution across different environments and teams.
-
SummaryRule Consistency: Shares core behavioral patterns including time management, scheduling logic, and KQL query execution patterns.
-
Cloud-Native Integration: Seamless discovery via Kubernetes pod annotations and integration with existing Collector infrastructure.
Detailed Implementation
Phase 1: Prometheus Scraping Mode
In the initial implementation, adxexporter exposes transformed KQL query results as Prometheus metrics on a /metrics endpoint.
Metrics Exposure Workflow
- CRD Discovery:
adxexporterwatches Kubernetes API forMetricsExporterCRDs matching its cluster labels - Query Execution: Execute KQL queries on specified intervals using
_startTimeand_endTimeparameters - Metrics Transformation: Convert query results to Prometheus metrics using OpenTelemetry metrics library
- Metrics Registration: Register/update metrics in the OpenTelemetry metrics registry
- HTTP Exposure: Serve metrics via HTTP endpoint for Collector scraping
Collector Integration
The existing Collector component discovers adxexporter instances via Kubernetes pod annotations:
# adxexporter Deployment/Pod
metadata:
annotations:
adx-mon/scrape: "true"
adx-mon/port: "8080"
adx-mon/path: "/metrics"
The Collector's pod discovery mechanism automatically detects these annotations and adds the adxexporter instances to its scraping targets.
Limitations of Phase 1
- Point-in-Time Metrics: Prometheus metrics represent current state; no historical backfill capability
- Scraping Dependency: Relies on Collector's scraping schedule, not direct control over export timing
- No Retry Logic: Failed queries result in stale metrics until next successful execution
Phase 2: Direct OTLP Push Mode
In the enhanced implementation, adxexporter can push metrics directly to OTLP endpoints with full backlog and retry capabilities.
Enhanced Workflow
- CRD Discovery: Same as Phase 1
- Query Execution: Same as Phase 1
- OTLP Transformation: Convert query results directly to OTLP metrics format
- Direct Push: Send metrics to configured OTLP endpoint
- Backlog Management: Queue failed exports in CRD status for retry
- Historical Backfill: Process backlogged time windows on successful reconnection
Backlog Strategy
Unlike Prometheus scraping, direct OTLP push enables sophisticated backlog management:
type MetricsExporterStatus struct {
Conditions []metav1.Condition `json:"conditions,omitempty"`
// LastSuccessfulExecution tracks the last successfully exported time window
LastSuccessfulExecution *metav1.Time `json:"lastSuccessfulExecution,omitempty"`
// Backlog contains failed export attempts pending retry
Backlog []BacklogEntry `json:"backlog,omitempty"`
}
type BacklogEntry struct {
StartTime metav1.Time `json:"startTime"`
EndTime metav1.Time `json:"endTime"`
Attempts int `json:"attempts"`
LastAttempt metav1.Time `json:"lastAttempt"`
Error string `json:"error,omitempty"`
}
Shared Infrastructure Patterns
adxexporter leverages proven patterns from SummaryRule implementation while operating as an independent component:
| Component | SummaryRule | adxexporter |
|---|---|---|
| Time Management | NextExecutionWindow(), interval-based scheduling |
Same patterns, independent implementation |
| Criteria Matching | Cluster label filtering | Same logic, different component |
| KQL Execution | ADX query with time parameters | Same patterns for query execution |
| Status Tracking | CRD conditions and backlog | Similar condition management |
| Backlog Handling | Async operation queues | Export retry queues (Phase 2) |
Component Independence
Unlike the original design that integrated with ingestor infrastructure, adxexporter operates independently:
- Separate Binary: Own entrypoint at
cmd/adxexporter - Independent Deployment: Deployed as separate Kubernetes workload
- Dedicated Configuration: Own command-line parameters and configuration
- Isolated Dependencies: Direct ADX connectivity without shared connection pools
Standard Schema Requirements
For MetricsExporter KQL queries to produce valid metrics, the result table must contain columns that can be mapped to the metrics format. The transformation is highly flexible and supports both simple and complex schemas.
Core Metrics Mapping
The adxexporter transforms KQL query results to metrics using this mapping:
| KQL Column | Prometheus/OTLP Field | Purpose |
|---|---|---|
Configured via valueColumn |
Metric value | The numeric metric value |
Configured via timestampColumn |
Metric timestamp | Temporal alignment (OTLP mode) |
Configured via metricNameColumn |
Metric name | Metric name identifier |
Any columns in labelColumns |
Metric labels/attributes | Dimensional metadata |
Required Columns
- Value Column: Must contain numeric data (real/double/int)
- Timestamp Column: Must contain datetime data (used in OTLP mode for temporal accuracy)
Optional Columns
- Metric Name Column: If not specified, uses
Transform.DefaultMetricName - Label Columns: Any additional columns become metric labels/attributes
Simple Example - Generic Use Case
KQL Query:
MyTelemetryTable
| where Timestamp between (_startTime .. _endTime)
| summarize
metric_value = avg(ResponseTime),
timestamp = bin(Timestamp, 5m)
by ServiceName, Region
| extend metric_name = "service_response_time_avg"
Transform Configuration:
transform:
metricNameColumn: "metric_name"
valueColumn: "metric_value"
timestampColumn: "timestamp"
labelColumns: ["ServiceName", "Region"]
Resulting Prometheus Metric (Phase 1):
# HELP service_response_time_avg Average response time by service and region
# TYPE service_response_time_avg gauge
service_response_time_avg{ServiceName="api-gateway",Region="us-east-1"} 245.7
# HELP service_response_time_avg Average response time by service and region
# TYPE service_response_time_avg gauge
service_response_time_avg{ServiceName="user-service",Region="us-west-2"} 189.3
Resulting OTLP Metric (Phase 2):
{
"name": "service_response_time_avg",
"gauge": {
"dataPoints": [{
"value": 245.7,
"timeUnixNano": "1640995200000000000",
"attributes": [
{"key": "ServiceName", "value": "api-gateway"},
{"key": "Region", "value": "us-east-1"}
]
}]
}
}
Complex Example - Advanced Analytics Use Case
For more complex schemas with additional metadata and calculated metrics:
KQL Query:
AnalyticsData
| where EventTime between (_startTime .. _endTime)
| summarize
Value = avg(SuccessRate),
Numerator = sum(SuccessCount),
Denominator = sum(TotalCount),
StartTimeUTC = min(EventTime),
EndTimeUTC = max(EventTime)
by LocationId, CustomerResourceId
| extend metric_name = "success_rate_analytics"
Transform Configuration:
transform:
metricNameColumn: "metric_name"
valueColumn: "Value"
timestampColumn: "StartTimeUTC"
labelColumns: ["LocationId", "CustomerResourceId", "Numerator", "Denominator", "EndTimeUTC"]
Resulting Prometheus Metrics (Phase 1):
# HELP success_rate_analytics Success rate analytics by location and customer
# TYPE success_rate_analytics gauge
success_rate_analytics{LocationId="datacenter-01",CustomerResourceId="customer-12345",Numerator="1974",Denominator="2000",EndTimeUTC="2022-01-01T10:05:00Z"} 0.987
Resulting OTLP Metric (Phase 2):
{
"name": "success_rate_analytics",
"gauge": {
"dataPoints": [{
"value": 0.987,
"timeUnixNano": "1640995200000000000",
"attributes": [
{"key": "LocationId", "value": "datacenter-01"},
{"key": "CustomerResourceId", "value": "customer-12345"},
{"key": "Numerator", "value": "1974"},
{"key": "Denominator", "value": "2000"},
{"key": "EndTimeUTC", "value": "2022-01-01T10:05:00Z"}
]
}]
}
}
This approach allows any KQL query result to be transformed into metrics by:
1. Selecting which column contains the primary metric value
2. Choosing the timestamp column for temporal alignment (OTLP mode)
3. Mapping all other relevant columns as dimensional labels
4. Optionally specifying a dynamic or static metric name
MetricsExporter CRD Example
apiVersion: adx-mon.azure.com/v1
kind: MetricsExporter
metadata:
name: service-response-times
namespace: monitoring
spec:
database: TelemetryDB
interval: 5m
criteria:
region: ["eastus", "westus"]
environment: ["production"]
body: |
ServiceTelemetry
| where Timestamp between (_startTime .. _endTime)
| summarize
metric_value = avg(ResponseTimeMs),
timestamp = bin(Timestamp, 1m)
by ServiceName, Environment
| extend metric_name = "service_response_time_avg"
transform:
metricNameColumn: "metric_name"
valueColumn: "metric_value"
timestampColumn: "timestamp"
labelColumns: ["ServiceName", "Environment"]
This example demonstrates:
1. Direct KQL Execution: Query executes directly without intermediate table storage
2. Criteria-Based Selection: Only processed by adxexporter instances with matching cluster labels
3. Flexible Output: Works with both Prometheus scraping (Phase 1) and OTLP push (Phase 2) modes
4. Time Window Parameters: KQL query uses _startTime and _endTime parameters (same as SummaryRule)
adxexporter Deployment Example
apiVersion: apps/v1
kind: Deployment
metadata:
name: adxexporter
namespace: adx-mon-system
spec:
replicas: 2
selector:
matchLabels:
app: adxexporter
template:
metadata:
labels:
app: adxexporter
spec:
containers:
- name: adxexporter
image: adx-mon/adxexporter:latest
args:
- --cluster-labels=region=eastus,environment=production,team=platform
- --kusto-endpoint=MetricsDB=https://cluster.kusto.windows.net
- --otlp-endpoint=http://otel-collector:4318/v1/metrics
ports:
- containerPort: 8081
name: health
livenessProbe:
httpGet:
path: /healthz
port: 8081
readinessProbe:
httpGet:
path: /readyz
port: 8081
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
Integration with Existing Components
The adxexporter component integrates with existing ADX-Mon infrastructure while maintaining independence:
- OTLP Push Integration:
adxexporterpushes metrics directly to any OTLP-compatible endpoint- Uses the same
PromToOtlpExporterinfrastructure as the Collector -
No scraping configuration required
-
Kubernetes API Integration:
adxexporterwatchesMetricsExporterCRDs via standard Kubernetes client-go- Leverages existing RBAC and authentication mechanisms
-
Operates within standard Kubernetes security boundaries
-
ADX Connectivity:
- Direct ADX connection using same authentication patterns as other components
- Independent connection management and pooling
- Reuses existing ADX client libraries and connection patterns
Execution Flow
Phase 1: Prometheus Scraping Mode
- CRD Discovery:
adxexporterdiscoversMetricsExporterCRDs matching its cluster labels - Scheduling: Determines execution windows based on
Intervaland last execution time - Query Execution: Execute KQL query with
_startTimeand_endTimeparameters - Metrics Transformation: Convert query results to Prometheus metrics format
- Registry Update: Update OpenTelemetry metrics registry with new values
- HTTP Exposure: Serve updated metrics on
/metricsendpoint - Collector Scraping: Collector discovers and scrapes metrics endpoint
- Status Update: Update CRD status with execution results
Phase 2: Direct OTLP Push Mode
- CRD Discovery: Same as Phase 1
- Scheduling: Same as Phase 1, plus backlog processing
- Query Execution: Same as Phase 1
- OTLP Transformation: Convert query results directly to OTLP format
- Direct Push: Send metrics to configured OTLP endpoint
- Backlog Management: Queue failed exports for retry
- Status Update: Update CRD status with execution and backlog state
Validation and Error Handling
The adxexporter controller validates:
- KQL query syntax and database accessibility
- Transform configuration matches query result schema
- Cluster label criteria for CRD processing eligibility
- Required columns (value, timestamp) are present in query results
- OTLP endpoint connectivity (Phase 2)
Use Cases
Use Case 1: Service Performance Metrics
# MetricsExporter for service response time monitoring
apiVersion: adx-mon.azure.com/v1
kind: MetricsExporter
metadata:
name: service-response-times
namespace: monitoring
spec:
database: TelemetryDB
interval: 1m
criteria:
environment: ["production"]
team: ["platform"]
body: |
ServiceTelemetry
| where Timestamp between (_startTime .. _endTime)
| summarize
metric_value = avg(ResponseTimeMs),
timestamp = bin(Timestamp, 1m)
by ServiceName, Environment
| extend metric_name = "service_response_time_avg"
transform:
metricNameColumn: "metric_name"
valueColumn: "metric_value"
timestampColumn: "timestamp"
labelColumns: ["ServiceName", "Environment"]
Deployment Configuration:
# adxexporter instance matching criteria
adxexporter \
--cluster-labels="environment=production,team=platform,region=eastus" \
--kusto-endpoint="TelemetryDB=https://cluster.kusto.windows.net" \
--otlp-endpoint="http://otel-collector:4318/v1/metrics"
Key Benefits:
- Direct Execution: No intermediate table storage required
- Real-time Metrics: Fresh data pushed to OTLP endpoint on each interval
- Timestamp Fidelity: Preserves actual KQL query timestamps
- Environment Isolation: Only processed by adxexporter instances with matching criteria
- Memory Efficient: Uses pkg/prompb object pooling for high cardinality metrics
Use Case 2: Advanced Analytics with Rich Metadata
# MetricsExporter for complex customer analytics
apiVersion: adx-mon.azure.com/v1
kind: MetricsExporter
metadata:
name: customer-analytics
namespace: analytics
spec:
database: AnalyticsDB
interval: 15m
criteria:
team: ["analytics"]
data-classification: ["customer-approved"]
body: |
CustomerEvents
| where EventTime between (_startTime .. _endTime)
| summarize
Value = avg(SuccessRate),
Numerator = sum(SuccessfulRequests),
Denominator = sum(TotalRequests),
StartTimeUTC = min(EventTime),
EndTimeUTC = max(EventTime),
AvgLatency = avg(LatencyMs)
by LocationId, CustomerResourceId, ServiceTier
| extend metric_name = strcat("customer_success_rate_", tolower(ServiceTier))
transform:
metricNameColumn: "metric_name"
valueColumn: "Value"
timestampColumn: "StartTimeUTC"
labelColumns: ["LocationId", "CustomerResourceId", "ServiceTier", "Numerator", "Denominator", "EndTimeUTC", "AvgLatency"]
Deployment Configuration:
# adxexporter instance for analytics team
adxexporter \
--cluster-labels="team=analytics,data-classification=customer-approved,region=westus" \
--kusto-endpoint="AnalyticsDB=https://analytics.kusto.windows.net" \
--otlp-endpoint="http://analytics-otel-collector:4318/v1/metrics"
Resulting Metrics:
- Primary value: Success rate percentage
- Rich labels: Location, customer, service tier, raw counts, time ranges, and auxiliary metrics
- Flexible naming: Dynamic metric names based on service tier
- Data Governance: Only processed by appropriately classified adxexporter instances
Use Case 3: Multi-Region Infrastructure Monitoring
# MetricsExporter for infrastructure metrics across regions
apiVersion: adx-mon.azure.com/v1
kind: MetricsExporter
metadata:
name: infrastructure-monitoring
namespace: sre
spec:
database: InfrastructureDB
interval: 30s
criteria:
role: ["infrastructure"]
region: ["eastus", "westus", "europe"]
body: |
SystemMetrics
| where Timestamp between (_startTime .. _endTime)
| summarize
metric_value = avg(CpuUtilization),
timestamp = bin(Timestamp, 30s)
by NodeName, ClusterName, Region
| extend metric_name = "node_cpu_utilization"
transform:
metricNameColumn: "metric_name"
valueColumn: "metric_value"
timestampColumn: "timestamp"
labelColumns: ["NodeName", "ClusterName", "Region"]
Multi-Region Deployment:
# East US adxexporter
adxexporter \
--cluster-labels="role=infrastructure,region=eastus" \
--kusto-endpoint="InfrastructureDB=https://eastus.kusto.windows.net" \
--otlp-endpoint="http://eastus-collector:4318/v1/metrics"
# West US adxexporter
adxexporter \
--cluster-labels="role=infrastructure,region=westus" \
--kusto-endpoint="InfrastructureDB=https://westus.kusto.windows.net" \
--otlp-endpoint="http://westus-collector:4318/v1/metrics"
# Europe adxexporter
adxexporter \
--cluster-labels="role=infrastructure,region=europe" \
--kusto-endpoint="InfrastructureDB=https://europe.kusto.windows.net" \
--otlp-endpoint="http://europe-collector:4318/v1/metrics"
Key Benefits:
- High-Frequency Monitoring: 30-second metric refresh intervals
- Geographic Distribution: Each region processes the same MetricsExporter with regional data
- Centralized Collection: All regional adxexporter instances scraped by their respective Collectors
- SRE Team Focus: Clear ownership through criteria-based filtering
Use Case 4: Cross-Cluster Error Rate Monitoring with Direct Push
# MetricsExporter for global error rate aggregation
apiVersion: adx-mon.azure.com/v1
kind: MetricsExporter
metadata:
name: global-error-rates
namespace: sre
spec:
database: GlobalMetrics
interval: 2m
criteria:
scope: ["global"]
priority: ["high", "critical"]
body: |
union
cluster('eastus-cluster').TelemetryDB.ErrorEvents,
cluster('westus-cluster').TelemetryDB.ErrorEvents,
cluster('europe-cluster').TelemetryDB.ErrorEvents
| where Timestamp between (_startTime .. _endTime)
| summarize
metric_value = count() * 1.0,
timestamp = bin(Timestamp, 1m),
error_rate = count() * 100.0 / countif(isnotempty(SuccessEvent))
by Region = tostring(split(ClusterName, '-')[0]), ServiceName
| extend metric_name = "global_error_count"
transform:
metricNameColumn: "metric_name"
valueColumn: "metric_value"
timestampColumn: "timestamp"
labelColumns: ["Region", "ServiceName", "error_rate"]
Deployment with Direct OTLP Push:
# Global monitoring adxexporter with OTLP push
adxexporter \
--cluster-labels="scope=global,priority=high" \
--kusto-endpoint="GlobalMetrics=https://global.kusto.windows.net" \
--otlp-endpoint="http://central-prometheus-gateway:4318/v1/metrics"
Global Monitoring Benefits:
- Cross-Cluster Aggregation: Single query across multiple ADX clusters
- Priority-Based Processing: Only runs on high/critical priority adxexporter instances
- Direct OTLP Push: Metrics pushed directly to central endpoint with pooled serialization
- Rich Context: Includes both raw counts and calculated error rates in labels
- Memory Efficient: Uses pkg/prompb pooling for high cardinality cross-cluster metrics
Configuration Strategy and Best Practices
adxexporter Configuration
The adxexporter component uses direct OTLP push for exporting metrics:
Standard OTLP Push Configuration
adxexporter \
--cluster-labels="team=platform,environment=production,region=eastus" \
--kusto-endpoint="MetricsDB=https://cluster.kusto.windows.net" \
--otlp-endpoint="http://otel-collector:4318/v1/metrics"
Multi-Database Configuration
adxexporter \
--cluster-labels="team=analytics,data-classification=approved" \
--kusto-endpoint="MetricsDB=https://metrics.kusto.windows.net" \
--kusto-endpoint="LogsDB=https://logs.kusto.windows.net" \
--otlp-endpoint="http://otel-collector:4318/v1/metrics"
Criteria-Based Deployment Strategy
Use the criteria field in MetricsExporter CRDs to control which adxexporter instances process them. This follows the same pattern as documented in ADX-Mon Ingestor Configuration.
Example: Environment-Based Processing
adxexporter Configuration: Result: Onlyadxexporter instances with environment=production cluster labels process this MetricsExporter
Example: Team and Region-Based Processing
adxexporter Configuration: Result: Processes MetricsExporter because team=analytics AND region=eastus both matchExample: Data Classification Controls
adxexporter Configuration: Result: Processes MetricsExporter because data-classification=internal AND priority=high both matchBenefits of Criteria-Based Architecture
- Security Boundaries: Control data access based on
adxexporterdeployment classification - Performance Isolation: Deploy separate
adxexporterinstances for high-frequency vs. low-frequency metrics - Geographic Distribution: Regional
adxexporterinstances process region-appropriate MetricsExporters - Team Autonomy: Teams deploy their own
adxexporterinstances with appropriate cluster labels - Resource Optimization: Distribute MetricsExporter processing load across appropriate instances
Collector Integration (Optional)
With the direct OTLP push implementation, Collector scraping is no longer required. The adxexporter pushes metrics directly to any OTLP-compatible endpoint (OpenTelemetry Collector, Prometheus with remote-write receiver, etc.).
If you want to use Collector scraping as an alternative to OTLP push, you can configure pod annotations:
# Pod annotations for Collector discovery (optional)
metadata:
annotations:
adx-mon/scrape: "true" # Enable scraping
adx-mon/port: "8080" # Metrics port
adx-mon/path: "/metrics" # Metrics path
However, the primary recommended configuration is direct OTLP push for better timestamp fidelity and simpler deployment.
Implementation Roadmap
This section provides a methodical breakdown of implementing the adxexporter component and MetricsExporter CRD across multiple PRs, with Phase 1 focusing on Prometheus scraping and Phase 2 adding direct OTLP push capabilities.
📊 Implementation Status Summary
Overall Progress: Phase 2 OTLP Push Mode Implemented
- ✅ Phase 1 Complete: Foundation, Scaffolding, Query Execution, Transform Engine
- ✅ Phase 2 OTLP Push Mode: Direct OTLP push using existing
PromToOtlpExporter - ⏸️ Prometheus Scraping Mode: Deprioritized in favor of direct OTLP push
Key Achievements:
- Complete MetricsExporter CRD with time window management and criteria matching
- Functional adxexporter component with Kubernetes controller framework
- Working KQL query execution with ADX integration and time window management
- Full transform engine for KQL to metrics conversion with validation
- Direct OTLP push using collector/export/PromToOtlpExporter with pkg/prompb pooling
- ToWriteRequest() function converts []MetricData to prompb.WriteRequest with object pooling
- Comprehensive unit tests and benchmarks for all core functionality
Architecture Decision: OTLP Push Over Prometheus Scraping
The implementation prioritizes Phase 2's direct OTLP push mode over Phase 1's Prometheus scraping for several reasons:
- Timestamp Fidelity: Preserves actual KQL query timestamps instead of scrape-time timestamps
- Memory Efficiency: Leverages pkg/prompb object pooling (WriteRequestPool, TimeSeriesPool)
- Existing Infrastructure: Reuses battle-tested PromToOtlpExporter from the Collector
- Simpler Deployment: No need for Collector discovery annotations or scraping configuration
- High Cardinality Support: Optimized for high-volume metric export scenarios
Code Quality: - ✅ Extensive unit test coverage - ✅ Follows ADX-Mon patterns (SummaryRule consistency) - ✅ Proper error handling and logging - ✅ Memory-efficient prompb pooling - ✅ Benchmarks for ToWriteRequest transformation
🔍 Implementation Details
Files Implemented:
- api/v1/metricsexporter_types.go - Complete CRD definition with time management methods
- cmd/adxexporter/main.go - Main component with CLI parsing and OTLP exporter initialization
- adxexporter/metricsexporter.go - Controller reconciler with criteria matching, query execution, and OTLP push
- adxexporter/kusto.go - KQL query executor with ADX client integration
- transform/kusto_to_metrics.go - Transform engine with ToWriteRequest() for prompb conversion
- Generated CRD manifests in kustomize/bases/ and operator/manifests/crds/
Key Features Working:
- ✅ MetricsExporter CRD with complete spec (Database, Body, Interval, Transform, Criteria)
- ✅ Time window calculation (ShouldExecuteQuery, NextExecutionWindow)
- ✅ Cluster-label based criteria matching (case-insensitive)
- ✅ KQL query execution with _startTime/_endTime substitution
- ✅ Transform validation and column mapping (value, labels, timestamps, metric names)
- ✅ ToWriteRequest() conversion with pkg/prompb object pooling
- ✅ Direct OTLP push via PromToOtlpExporter
- ✅ Controller-runtime manager with graceful shutdown
- ✅ Health checks (readyz/healthz endpoints)
CLI Configuration:
adxexporter \
--cluster-labels="region=eastus,environment=production" \
--kusto-endpoint="MetricsDB=https://cluster.kusto.windows.net" \
--otlp-endpoint="http://otel-collector:4318/v1/metrics" # Required
Phase 1: Prometheus Scraping Implementation
1. Foundation: MetricsExporter CRD Definition ✅ COMPLETE
Goal: Establish the core data structures and API types
- Deliverables:
- ✅ Create api/v1/metricsexporter_types.go with complete CRD spec
- ✅ Define MetricsExporterSpec, TransformConfig, and status types
- ✅ Add deepcopy generation markers and JSON tags
- ✅ Update api/v1/groupversion_info.go to register new types
- ✅ Generate CRD manifests using make generate-crd CMD=update
- Testing: ✅ Unit tests for struct validation and JSON marshaling/unmarshaling
- Acceptance Criteria: ✅ CRD can be applied to cluster and kubectl can describe the schema
2. adxexporter Component Scaffolding ✅ COMPLETE
Goal: Create the standalone adxexporter component infrastructure
- Deliverables:
- ✅ Create cmd/adxexporter/main.go with command-line argument parsing
- ✅ Implement cluster-labels parsing and criteria matching logic
- ✅ Add Kubernetes client-go setup for CRD watching
- ✅ Create basic controller framework for MetricsExporter reconciliation
- ✅ Add graceful shutdown and signal handling
- Testing: ✅ Integration tests for component startup and CRD discovery
- Acceptance Criteria: ✅ adxexporter starts successfully and can discover MetricsExporter CRDs
3. KQL Query Execution Engine ✅ COMPLETE
Goal: Implement KQL query execution with time window management
- Deliverables:
- ✅ Create ADX client connection management in adxexporter
- ✅ Implement time window calculation logic (adapted from SummaryRule patterns)
- ✅ Add KQL query execution with _startTime/_endTime parameter injection
- ✅ Implement scheduling logic based on Interval and last execution tracking
- ✅ Add comprehensive error handling for query failures
- Testing: ✅ Unit tests with mock ADX responses and integration tests with real ADX
- Acceptance Criteria: ✅ Can execute KQL queries on schedule with proper time window management
4. Transform Engine: KQL to Prometheus Metrics ✅ COMPLETE
Goal: Transform KQL query results to Prometheus metrics format
- Deliverables:
- ✅ Create transform/kusto_to_metrics.go with transformation engine (Note: uses generic name, not prometheus-specific)
- ✅ Implement column mapping (value, metric name, labels) for Prometheus format
- ✅ Add data type validation and conversion (numeric values, string labels)
- ✅ Handle missing columns and default metric names
- ✅ Integrate with OpenTelemetry metrics library for Prometheus exposition
- Testing: ✅ Extensive unit tests with various KQL result schemas and edge cases
- Acceptance Criteria: ✅ Can transform any valid KQL result to Prometheus metrics
5. Prometheus Metrics Server ✅ COMPLETE
Goal: Expose transformed metrics via HTTP endpoint for Collector scraping
- Deliverables:
- ✅ Implement HTTP server with configurable port and path (uses controller-runtime's shared metrics server)
- ✅ Integrate OpenTelemetry Prometheus exporter library
- ✅ Add metrics registry management and lifecycle handling
- ✅ Implement graceful shutdown of HTTP server (handled by controller-runtime)
- ✅ Add health check endpoints for liveness/readiness probes
- Testing: ✅ HTTP endpoint tests and Prometheus format validation
- Acceptance Criteria: ✅ Serves valid Prometheus metrics on /metrics endpoint via controller-runtime's shared registry
6. Collector Discovery Integration ❌ NOT COMPLETE
Goal: Enable automatic discovery by existing Collector infrastructure
- Deliverables:
- ❌ Add pod annotation configuration to adxexporter deployment manifests
- ❌ Document Collector integration patterns and discovery mechanism
- ❌ Create example Kubernetes manifests with proper annotations
- ❌ Validate end-to-end scraping workflow with Collector
- Testing: ❌ End-to-end tests with real Collector scraping adxexporter metrics
- Acceptance Criteria: ❌ Collector automatically discovers and scrapes adxexporter metrics
7. Status Management and Error Handling 🔄 PARTIALLY COMPLETE
Goal: Implement comprehensive status tracking and error recovery
- Deliverables:
- 🔄 Add MetricsExporter CRD status updates with condition management (methods implemented, cluster updates missing)
- ✅ Implement retry logic for transient query failures
- ✅ Add structured logging with correlation IDs and trace information
- ✅ Create error classification (transient vs permanent failures)
- ❌ Add metrics for adxexporter operational monitoring
- Testing: ❌ Chaos engineering tests with various failure scenarios
- Acceptance Criteria: 🔄 Graceful error handling with proper status reporting (logic implemented, needs cluster status updates)
Phase 2: Direct OTLP Push Implementation
8. OTLP Client Integration and Prometheus Remote Write Support
Goal: Add direct push capabilities with multiple protocol support
- Deliverables:
- Integrate OpenTelemetry OTLP exporter client
- Leverage pkg/prompb for Prometheus remote write support
- Add OTLP endpoint configuration and connection management
- Implement OTLP metrics format transformation (separate from Prometheus)
- Add Prometheus remote write transformation using pkg/prompb.TimeSeries
- Add connection health checking and circuit breaker patterns
- Support both HTTP and gRPC OTLP protocols
- Support Prometheus remote write protocol via pkg/prompb.WriteRequest
- Testing: Integration tests with mock and real OTLP endpoints and Prometheus remote write
- Acceptance Criteria: Can push metrics directly to OTLP endpoints and Prometheus remote write endpoints
9. Backlog and Retry Infrastructure
Goal: Implement sophisticated backlog management for reliable delivery
- Deliverables:
- Extend MetricsExporter CRD status with backlog tracking
- Implement failed export queuing in CRD status
- Add exponential backoff retry logic with configurable limits
- Create backlog processing scheduler for historical data
- Leverage pkg/prompb pooling mechanisms (WriteRequestPool, TimeSeriesPool) for memory efficiency
- Add dead letter queue for permanently failed exports
- Testing: Reliability tests with network partitions and endpoint failures
- Acceptance Criteria: Reliable metric delivery with historical backfill capabilities
9.1. Leveraging pkg/prompb for Enhanced Performance and Timestamp Fidelity
Goal: Utilize existing high-performance protobuf implementation for Phase 2
- Deliverables:
- Transform Engine Enhancement: Create transform/kusto_to_prompb.go to convert KQL results directly to pkg/prompb.TimeSeries
- Timestamp Preservation: Use pkg/prompb.Sample to preserve actual timestamps from TimestampColumn (unlike Phase 1 gauges)
- Memory Optimization: Implement object pooling using pkg/prompb.WriteRequestPool and pkg/prompb.TimeSeriesPool
- Historical Data Support: Enable proper temporal ordering for backfill scenarios using pkg/prompb.Sample.Timestamp
- Efficient Batching: Group multiple time series into pkg/prompb.WriteRequest for batch processing
- Label Optimization: Use pkg/prompb.Sort() for proper label ordering and efficient serialization
- Key Benefits:
- Reduced GC Pressure: Object pooling minimizes memory allocations during high-frequency processing
- Timestamp Fidelity: Preserve actual query result timestamps instead of current time
- Prometheus Compatibility: Native support for Prometheus remote write protocol
- Performance: Optimized protobuf marshaling for large result sets
- Backfill Capability: Support historical data with proper temporal alignment
- Testing: Performance benchmarks comparing pooled vs non-pooled implementations
- Acceptance Criteria: Significantly reduced memory allocation and improved timestamp accuracy
10. Hybrid Mode Operation
Goal: Support multiple output modes simultaneously with shared query execution
- Deliverables:
- Enable concurrent operation of Prometheus scraping, OTLP push, and Prometheus remote write
- Add configuration options for selective output mode per MetricsExporter
- Implement shared query execution with multiple output transformations:
- OpenTelemetry metrics (Phase 1) for /metrics endpoint scraping
- OTLP format for direct OTLP push
- pkg/prompb.TimeSeries for Prometheus remote write
- Dual Transform Architecture: Create separate transform paths while sharing KQL execution
- Add performance optimization for multi-mode operation
- Testing: Load tests with all output modes active simultaneously
- Acceptance Criteria: Efficient operation in hybrid mode without performance degradation
Phase 2 Architecture Enhancement: pkg/prompb Integration
Motivation for pkg/prompb Integration
The existing pkg/prompb package provides significant advantages for Phase 2 implementation:
- Timestamp Fidelity: Unlike Phase 1 OpenTelemetry gauges (which represent current state),
pkg/prompb.Samplepreserves actual timestamps from KQLTimestampColumn - Memory Efficiency: Object pooling (
WriteRequestPool,TimeSeriesPool) reduces GC pressure during high-frequency processing - Historical Data Support: Proper temporal ordering enables backfill scenarios with accurate timestamps
- Prometheus Compatibility: Native support for Prometheus remote write protocol
- Performance: Optimized protobuf marshaling for large result sets
Implementation Strategy
Dual Transform Architecture:
// Phase 1: OpenTelemetry metrics (current)
func (r *MetricsExporterReconciler) transformToOTelMetrics(rows []map[string]any) ([]transform.MetricData, error)
// Phase 2: Add prompb transformation
func (r *MetricsExporterReconciler) transformToPromTimeSeries(rows []map[string]any) ([]*prompb.TimeSeries, error)
Key Integration Points:
- Transform Engine: Create
transform/kusto_to_prompb.goalongside existingtransform/kusto_to_metrics.go - Memory Management: Use
prompb.WriteRequestPool.Get()andprompb.TimeSeriesPool.Get()for efficient object reuse - Timestamp Handling: Extract timestamps from
TimestampColumnand convert toint64forprompb.Sample.Timestamp - Label Processing: Use
prompb.Sort()for proper label ordering and efficient serialization - Batching: Group multiple time series into
prompb.WriteRequestfor batch transmission
Configuration Extensions:
type MetricsExporterReconciler struct {
// ... existing fields ...
PrometheusRemoteWriteEndpoint string
EnablePrometheusRemoteWrite bool
EnableOTLP bool
}
Output Mode Selection:
- Phase 1 Only: OpenTelemetry metrics for /metrics scraping
- Phase 2 Hybrid: OpenTelemetry + Prometheus remote write + OTLP push
- Phase 2 Direct: Skip OpenTelemetry, use only push modes for better performance
Benefits Over Current Implementation
| Aspect | Phase 1 (OpenTelemetry) | Phase 2 (with prompb) |
|---|---|---|
| Timestamp Handling | Current time only | Preserves actual query timestamps |
| Memory Usage | Standard allocation | Pooled objects, reduced GC pressure |
| Historical Data | Not supported | Full backfill capability |
| Protocol Support | Prometheus scraping only | Prometheus remote write + OTLP |
| Performance | Good for scraping | Optimized for high-volume push |
Quality and Operations
11. Performance Optimization and Scalability
Goal: Optimize for production workloads and multi-tenancy
- Deliverables:
- Add connection pooling and query optimization
- Implement parallel processing for multiple MetricsExporter CRDs
- Leverage pkg/prompb pooling for memory-efficient metric processing
- Add resource usage monitoring and throttling mechanisms
- Optimize memory usage for large result sets using pooled objects
- Implement efficient label sorting and deduplication using pkg/prompb.Sort()
- Add configurable resource limits and circuit breakers
- Testing: Load testing with high-volume data and many MetricsExporter CRDs
- Acceptance Criteria: Handles production-scale workloads within resource constraints
12. Comprehensive Test Suite
Goal: Ensure complete test coverage across all scenarios - Deliverables: - Unit tests for all packages with >90% coverage - Integration tests for ADX connectivity and metrics output - End-to-end tests covering full workflow scenarios - Performance benchmarks and scalability tests - Chaos engineering tests for resilience validation - Testing: Automated test execution in CI/CD pipeline - Acceptance Criteria: All tests pass consistently in CI environment
13. Documentation and Examples
Goal: Provide comprehensive documentation for users and operators
- Deliverables:
- Update CRD documentation in docs/crds.md
- Create detailed configuration guide with deployment examples
- Add troubleshooting guide for common issues and debugging
- Document best practices for criteria-based deployment
- Create operational runbooks for production deployment
- Testing: Documentation review and validation of all examples
- Acceptance Criteria: Users can successfully deploy and operate adxexporter using documentation
14. Observability and Monitoring
Goal: Add comprehensive observability for operational excellence
- Deliverables:
- Add Prometheus metrics for adxexporter operational metrics (query rates, errors, latency)
- Implement structured logging with correlation IDs and distributed tracing
- Create Grafana dashboards for adxexporter monitoring
- Add alerting rules for common failure scenarios
- Add health check endpoints for load balancer integration
- Testing: Validate all metrics and observability in staging environment
- Acceptance Criteria: Operations team can monitor and troubleshoot adxexporter effectively
Dependencies and Sequencing
Phase 1 Critical Path: Steps 1-7 must be completed sequentially for basic functionality
Phase 2 Critical Path: Steps 8-10 build on Phase 1 for enhanced capabilities
Parallel Development: Steps 11-14 can be developed in parallel with Phase 2
Milestone Reviews: Technical review after steps 3, 7, 10, and 12
Key Dependencies:
- Step 2 enables independent adxexporter development
- Step 4 provides foundation for both Phase 1 and Phase 2 output modes
- Step 7 completes Phase 1 for production readiness
- Step 10 completes Phase 2 for enhanced reliability
This roadmap ensures incremental delivery with Phase 1 providing immediate value through Prometheus integration, while Phase 2 adds sophisticated reliability features for enterprise deployments.
Conclusion
The adxexporter component and MetricsExporter CRD provide a comprehensive solution for transforming ADX data into standardized metrics for observability platforms. The implementation prioritizes direct OTLP push for enterprise-grade reliability:
Implementation Benefits:
- Direct OTLP Push: Push metrics directly to any OTLP-compatible endpoint
- Timestamp Fidelity: Preserves actual KQL query result timestamps instead of export time
- Memory Efficiency: Leverages pkg/prompb object pooling (WriteRequestPool, TimeSeriesPool) for high cardinality metrics
- Existing Infrastructure: Reuses battle-tested PromToOtlpExporter from the Collector
- Criteria-Based Security: Secure, distributed processing with team and environment isolation
- No Intermediate Storage: Enables KQL-to-metrics transformation without ADX table materialization
Key Technical Advantage: pkg/prompb Integration
The implementation leverages the existing pkg/prompb package for:
- Memory Efficiency: Object pooling reduces allocation overhead during high-frequency processing
- Timestamp Accuracy: Preserve temporal fidelity from KQL query results for proper historical analysis
- Protocol Compatibility: Native Prometheus remote write support via OTLP endpoints
- Performance: Optimized protobuf serialization for large-scale deployments
Transform Pipeline:
KQL Query Results → KustoToMetricsTransformer → []MetricData → ToWriteRequest() → prompb.WriteRequest → PromToOtlpExporter → OTLP Endpoint
This design provides a scalable, secure, and maintainable foundation for organizations to operationalize their ADX analytics data across their observability infrastructure.