2025-01: Improving ARM References

Context

Many Azure Resource Manager (ARM) resources have references to other resources. When working with ARM Templates or Bison, these are almost universally handled as simple strings containing a fully qualified ARM reference.

As a part of importing resources into Azure Service Operator (ASO), we modify these properties into formal resource references allowing them to contain either an ARM resource Id, or the GVK of a resource within the cluster.

We modify a referencing property as follows:

  • The suffix Reference is added to the name (selected suffixes such as Id are also removed).
  • The type of the property is changed from string to genruntime.ResourceReference

For example, keyVaultId: string becomes keyVaultReference: genruntime.ResourceReference.

Not every such property is unambiguously annotated in the OpenAPI specifications we read, so our code generator applies a number of heuristics to identify these properties. We end up with three groups of properties:

  • Those that are definitely ARM references.
  • Those that our heuristics suggest are ARM references.
  • All other properties.

For those properties in the second group, we require manual configuration to indicate whether they are ARM references or not. This is done by adding an $armReference modifier into our configuration file. The code generator will fail with an error any property identified by our heuristics is not explicitly configured.

We can also use the configuration for any property in the third group that’s actually an ARM reference too. This is useful when we have a property that’s not identified by our heuristics, but we know it’s an ARM reference.

For example, in this configuration fragment we can see confirmation that ActionGroupsInformation.GroupIds is an ARM Reference, while Detector.Id is not:

  alertsmanagement:
    2021-04-01:
      SmartDetectorAlertRule:
        $export: true
        $supportedFrom: v2.11.0
      AlertRuleProperties:
        Scope:
          $armReference: true
      ActionGroupsInformation:
        GroupIds:
          $armReference: true
      Detector:
        Id:
          $armReference: false

Missed ARM references

The major failure mode of this approach is that we can sometimes (fortunately rarely) miss identifying a property as an ARM reference. When this happens, we end up with a property that is a string when it should be a genruntime.ResourceReference. This isn’t a problem if we catch the problem before releasing the resource, but if the resource is already released, changing the type and name of the property is a breaking change.

In #3772 (Bug: MySQL Flexible Server property sourceServerResourceId has wrong type), @theunrepentantgeek made this comment:

Trying out some improvements to our heuristic and finding other properties we missed:

Group Version Kind Property
authorization v1api20200801preview RoleAssignmentProperties DelegatedManagedIdentityResourceId
authorization v1api20220401 RoleAssignmentProperties DelegatedManagedIdentityResourceId
dbformysql v1api20210501 ServerProperties SourceServerResourceId
machinelearningservices v1api20210701 IdentityForCmk UserAssignedIdentity
machinelearningservices v1api20210701 KeyVaultProperties KeyVaultArmId

For some of these, we were able to patch the problem by importing a newer version of the resource and adding conversion code to translate between the old and new versions. Users of the newer version of the resource would then see the correct property types, while users of the old version are stuck with the incorrect types.

However, we cannot apply this workaround when there is no newer version of the resource available - as has occurred with the authorization group, where the latest version is v1api20220401.

Other examples

  • MetricAlert.ActionGroupId - #3643 Improvement: support action group and metric linking
  • ServerProperties.SourceServeResourceId - see #3829 Update dbformysql to latest version

Aliases

There are rare properties that accept more than just ARM IDs - meaning that a direct transformation to a ResourceReference isn’t sufficient.

One case of this is discussed in #4531 (ResourceReference should support Alias) - in the case of PrivateLinkServiceConnection, the value privateLinkServiceId that’s used to identify a PrivateLinkService may be either a regular ARM ID or an alias.

We currently have no way to represent this in ASO, as validation on the armId property requires values to confirm to the ARM ID format. (Loosening validation to allow aliases would delay discovery of invalid values for the majority of cases where aliases are not permitted.)

Friendly names

In #3642 (Improvement: support for the friendly names of the builtin roles) we have a request to allow friendly names like Contributor instead of requiring users to look up the specific GUID needed and constructing the ARM ID by hand.

This is similar to Alias support above, except that we would be resolving the name to an ARM ID within ASO itself, instead of leaving that to ARM.

Again, we currently have no way to represent this in ASO.

Option 1: Break existing users

When we identify a property that should be treated as an ARM reference, but for which we missed setting $armReference: true in our configuration, update the configuration and run the code generator to apply the change even if that resource has already been released.

  • PRO: Simple to do
  • PRO: Familiar, as we already do this with unreleased resources
  • PRO: Solves the Missed ARM References problem
  • CON: Breaking change for any existing users, as both the name and type of the property will change.
  • CON: Does not solve the Alias support or Friendly names problems.

The breaking change is nasty, as an upgrade of the ASO operator would result in a cluster resource that’s silently updated incorrectly, plus any attempt to repair the resource would fail until the YAML was modified to the new structure.

Option 2: Allow preserving the original property

Under normal operation ($armReference: true) the existing property is removed, and replaced with a new property for the reference.

For example, we transform

type PrometheusRuleGroupAction struct {
    ActionGroupId    string
    ActionProperties map[string]string
}

to

type PrometheusRuleGroupAction struct {
    ActionGroupReference *genruntime.ResourceReference
    ActionProperties     map[string]string
}

When configured with $armReference: false, we leave the property as is.

We could add a third option to specify the existing property is to be retained, making the result of the transformation look like this:

type PrometheusRuleGroupAction struct {
    ActionGroupId        string
    ActionGroupReference *genruntime.ResourceReference
    ActionProperties     map[string]string
}

We already have precendent for this, with the *FromConfig properties that we inject for selected properties.

Validation would need to ensure the two properties are mutually exclusive.

When constructing the payload to send to ARM, we would need to select between the two properties, using whichever is set. We have an exisiting resource extension point we can use to implement this.

Adding a third option to a boolean flag is a very bad idea (though we’ve seen it done), so we’d need to rename $armReference to something else for clarity. Fortunately, we control azure_arm.yaml so this is achievable. (We don’t believe any users have their own configuration file, but we should call this out as a breaking in our release notes anyway.)

One possible name for flag would be $isReference with values arm, side-by-side and no, though this phrasing is quite awkward.

  • PRO: Relatively straightforward
  • PRO: We already have precedent for side-by-side properties
  • PRO: Solves all three scenarios.
  • CON: Problematic to have two properties that handle three use-cases (e.g. use privateLinkServiceReference for GVK or ARMId, and privateLinkServiceId for alias).
  • CON: No control over the naming of things

Option 3: Custom reference types

Instead of always replacing the property with a genruntime.ResourceReference, allow selection between a set of standard reference types.

For the vast majority of reference properties, we’d continue to use the existing genruntime.ResourceReference.

To support the Alias and Friendly names cases, introduce a new reference type, say genruntime.AliasResourceReference, with the following structure:

type AliasResourceReference struct {
   Group     string
   Kind      string
   Namespace string
   Name      string

    ARMID string

    Alias string
}

The code generator would need to be updated to support this new value.

Finally, when the property isn’t an reference at all, but just a simple string, we’d have a value that means leave this alone. This is the case where we’d currently us $armReference: false.

Again, our current $armReference flag would need to be renamed to something more general. Perhaps $referenceType with possible values arm, armOrAlias, and other.

  • PRO: Great YAML structure for end users of resources.
  • PRO: Extensible for the (admittedly unlikely) scenario where we need another different kind.
  • PRO: Cleanly handles both the Alias and Friendly names scenarios.
  • CON: Slightly greater implementation complexity.
  • CON: Does not solve the Missed ARM References problem.

Option 4: Multivalued custom reference types

Similar to Option 3, but allowing set combinations of values to be specified for the reference type.

Possible values would be arm, alias and other, plus the combinations arm+alias and arm+other. Other combinations would be invalid.

MetricAlertAction:
    ActionGroupId:
        $referenceType: arm+other

Returning to the PrometheusRuleGroupAction example from above, we’d specify $referenceType: arm+other to get:

type PrometheusRuleGroupAction struct {
    ActionGroupId        string
    ActionGroupReference *genruntime.ResourceReference
    ActionProperties     map[string]string
}

This handles the Missed ARM References problem by allowing us to retain the existing property while introducing the corrected property side-by-side.

In the RoleAssignment case, we’d specify $referenceType: arm+alias to get:

type RoleAssignment_Spec struct {
    // ... elided ...
    roleDefinitionReference *genruntime.AliasResourceReference
}
  • PRO: Great YAML structure for end users.
  • PRO: Good experience for configuration of our code generator.
  • PRO: Extensible for the (admittedly unlikely) scenario where we need another different kind.
  • PRO: Solves all three scenarios.
  • CON: Slightly greater implementation complexity.

Decision

Reccommendation: Option 4

Variation: We may want to YAML list syntax for clarity:

MetricAlertAction:
    ActionGroupId:
        $referenceType: 
          - arm
          - other

or

MetricAlertAction:
    ActionGroupId:
        $referenceType: [arm, other]

The major downside of this is that we’d make the usual case (with just a single option) more verbose, but we could mitigate that by having two mutually exclusive options, $referenceType (a string) and $referenceTypes (a list).

  • We’ll want to look into where we resolve IDs and make sure that implementing this as a per-resource extension is right.
  • The value other is a bit vague, we should try to find a better name that covers both not a reference and a reference we don’t provide special support for. Suggestions so far include raw and unmodified.

Status

Proposed.

Consequences

TBC

Experience Report

TBC

References

None