PostReconciliationChecker
Description
PostReconciliationChecker allows resources to perform validation or checks after ARM reconciliation completes successfully. This extension is invoked after Azure operations succeed but before the Ready condition is marked successful, giving resources the ability to defer the Ready status until additional conditions are met.
The interface is called at the end of the reconciliation process, after the resource has been successfully created or updated in Azure. It provides a final opportunity to verify that the resource is truly ready for use.
Interface Definition
See the PostReconciliationChecker interface definition in the source code.
Motivation
The PostReconciliationChecker extension exists to handle cases where:
- Async operations: Azure resource is created but still initializing or provisioning
- Manual approval required: Resources that require external approval before being considered ready
- Dependent state: Resource readiness depends on the state of other Azure resources
- Complex readiness criteria: Determining if a resource is “ready” requires more than just ARM success
- Validation: Additional checks needed to ensure resource is in expected state
- Gradual rollout: Resource created but waiting for deployment to complete
The default behavior marks resources as Ready immediately after successful ARM operations. Some resources need to wait for additional conditions before being truly ready.
When to Use
Implement PostReconciliationChecker when:
- ✅ Resource continues initializing after ARM operations complete
- ✅ Manual approval or external processes must complete first
- ✅ Dependent resources need to reach certain states
- ✅ Complex validation is needed to determine readiness
- ✅ Resource provisioning happens asynchronously in Azure
- ✅ Users should wait before using the resource
Do not use PostReconciliationChecker when:
- ❌ ARM success means the resource is ready
- ❌ The check should happen before reconciliation (use PreReconciliationChecker)
- ❌ You’re validating the spec (do that in webhooks)
- ❌ You’re modifying the resource (use other extensions)
Example: Private Endpoint Approval Check
See the full implementation in private_endpoints_extensions.go.
Key aspects of this implementation:
- Type assertions: For both resource type and hub version
- Status inspection: Examines resource status to determine readiness
- Clear failure messages: Provides actionable information to users
- Uses factory methods: Always uses the factory methods for
PostReconcileCheckResultto ensure consistency - No ARM calls: Uses existing status data
- Conditional result: Returns success or failure based on state
- No error return: Check itself succeeded, but resource not ready yet
Common Patterns
Pattern 1: Check Resource State
func (ex *ResourceExtension) PostReconcileCheck(
ctx context.Context,
obj genruntime.MetaObject,
owner genruntime.MetaObject,
resourceResolver *resolver.Resolver,
armClient *genericarmclient.GenericClient,
log logr.Logger,
next extensions.PostReconcileCheckFunc,
) (extensions.PostReconcileCheckResult, error) {
resource := obj.(*myservice.MyResource)
// Check if resource is in expected state
if resource.Status.ProvisioningState == nil {
return extensions.PostReconcileCheckResultFailure(
"provisioning state not yet available"), nil
}
state := *resource.Status.ProvisioningState
if state != "Succeeded" {
return extensions.PostReconcileCheckResultFailure(
fmt.Sprintf("resource still provisioning: %s", state)), nil
}
return extensions.PostReconcileCheckResultSuccess(), nil
}
Pattern 2: Query Azure for Additional State
func (ex *ResourceExtension) PostReconcileCheck(
ctx context.Context,
obj genruntime.MetaObject,
owner genruntime.MetaObject,
resourceResolver *resolver.Resolver,
armClient *genericarmclient.GenericClient,
log logr.Logger,
next extensions.PostReconcileCheckFunc,
) (extensions.PostReconcileCheckResult, error) {
resource := obj.(*myservice.MyResource)
// Get resource ID for additional queries
resourceID, hasID := genruntime.GetResourceID(resource)
if !hasID {
return extensions.PostReconcileCheckResultFailure(
"resource ID not available"), nil
}
// Query Azure for deployment status
deploymentReady, err := ex.checkDeploymentStatus(ctx, resourceID, armClient)
if err != nil {
// Check failed (not the same as check returning failure)
return extensions.PostReconcileCheckResult{}, err
}
if !deploymentReady {
return extensions.PostReconcileCheckResultFailure(
"deployment still in progress"), nil
}
return extensions.PostReconcileCheckResultSuccess(), nil
}
Pattern 3: Check Dependent Resources
func (ex *ResourceExtension) PostReconcileCheck(
ctx context.Context,
obj genruntime.MetaObject,
owner genruntime.MetaObject,
resourceResolver *resolver.Resolver,
armClient *genericarmclient.GenericClient,
log logr.Logger,
next extensions.PostReconcileCheckFunc,
) (extensions.PostReconcileCheckResult, error) {
resource := obj.(*myservice.MyResource)
// Check if dependent resources are ready
if resource.Spec.DependencyReference != nil {
ready, err := ex.isDependencyReady(ctx, resource.Spec.DependencyReference, resourceResolver)
if err != nil {
return extensions.PostReconcileCheckResult{}, err
}
if !ready {
return extensions.PostReconcileCheckResultFailure(
"waiting for dependent resource to be ready"), nil
}
}
return extensions.PostReconcileCheckResultSuccess(), nil
}
Check Results
The extension returns one of two results, or an error.
Success
return extensions.PostReconcileCheckResultSuccess(), nil
- Resource is ready
- Ready condition will be marked True
- Reconciliation completes successfully
Failure
return extensions.PostReconcileCheckResultFailure("reason for not ready"), nil
- Resource is not yet ready
- Warning condition set with the provided reason
- Reconciliation will retry later
- Resource requeued for another check
Error
return extensions.PostReconcileCheckResult{}, fmt.Errorf("check failed: %w", err)
- The check itself failed (couldn’t determine readiness)
- Error condition set on resource
- Reconciliation will retry later
Failure vs. Error
It’s critical to distinguish between check failure and check error:
-
Failure: “The resource is not ready yet” (expected, will retry)
- Example: “waiting for approval”, “deployment in progress”
- Returns
PostReconcileCheckResultFailure(reason), nil
-
Error: “I couldn’t determine if the resource is ready” (unexpected)
- Example: “failed to query Azure”, “invalid state”
- Returns
PostReconcileCheckResult{}, error
Reconciliation Impact
When a post-reconciliation check fails:
- Condition set: Warning condition added to resource status
- Reconciliation requeued: Controller will try again later
- No Ready condition: Resource not marked as Ready
- User visibility: Users see the warning condition with reason
This continues until the check succeeds or the resource is deleted.
Testing
When testing PostReconciliationChecker extensions:
- Test success case: Verify check passes when resource is ready
- Test failure cases: Cover all scenarios that should defer readiness
- Test error handling: Verify proper error returns
- Test with real status: Use realistic status values
- Test retry behavior: Verify requeue happens correctly
Important Notes
- Call
next()if appropriate: Allows for check chaining (rarely needed) - Don’t modify the resource: This is for validation only
- Be patient: Checks may run many times before succeeding
- Use factory methods: Always uses the factory methods for
PostReconcileCheckResultto ensure consistency - Provide clear reasons: Failure messages shown to users
- Log appropriately: Help debugging without noise
- Handle nil values: Status fields may not be populated yet
- Consider performance: This runs on every reconciliation
Common Readiness Scenarios
Here are typical reasons to use post-reconciliation checks:
- Async provisioning: Azure resource created but still initializing
- Manual approval: External human approval required (you can check whether this has been done and provide a better message to the user than “not ready”)
- Deployment rollout: Waiting for deployment to all regions/instances
- Certificate generation: Waiting for certs to be issued
- DNS propagation: Waiting for DNS changes to propagate
- Dependent services: Waiting for related services to initialize
Related Extension Points
- PreReconciliationChecker: Check before reconciliation
Best Practices
- Keep checks fast: Runs frequently, avoid expensive operations
- Be idempotent: Check may run multiple times
- Use status fields: Prefer examining status over ARM calls
- Provide context: Clear failure messages help users
- Consider timeouts: Don’t wait forever for readiness
- Log decisions: Help debugging by logging why checks fail/succeed