Managing dataplane secrets
What secrets are we talking about?
The secrets discussed in this document are associated with accessing the data plane of various services.
Think: Accessing a StorageAccount
via Shared Key, accessing a MySQLServer
by admin Username
and Password
, or accessing a VM
by SSHKey
.
Sometimes these secrets may be used by other CRDs managed by ASO, as would be the case for a MySQLUser
CRD, but often the consumers of these
secrets are the users applications.
Goals
ASO should not be generating secrets on the users behalf
ASO v1 generated secrets on the users behalf in some cases. The users had access to the generated secret as ASO wrote it into a Kubernetes (or KeyVault) secret.
There are problems with this approach:
- It makes performing secret rollover more difficult. If the user had specified the secret, rollover is as easy as modifying the specified secret, which triggers a reconcile, which triggers a PUT to the Azure Resource updating the secret. If the operator generated the secret then the user must somehow issue an instruction to us to perform an action to roll the secret, which doesn’t fit as well into the goal-seeking paradigm Kubernetes prefers.
- It takes control away from the user. What if the secret the operator generates doesn’t comply with a particular organizations complexity requirements?
- It requires that we do a very good job generating cryptographically secure passwords/secrets. This could easily become a can of worms.
- It hinders adoption of existing resources. If the operator expects to always generate the secret for a SQL DB, but the user wants to import a SQL DB they’ve already created through some other mechanism, then they are at an impasse as there is no (easy) way for them to provide ASO with the secret.
- It doesn’t work well with GitOps. A number of customers have expressed the desire to move resources between namespaces. In theory this is easy -
just create the exact same resources in a different namespace (pointing at the same Azure resources), mark the old resources as
skip-deletion
(so that the backing Azure resources are not deleted), and delete the original namespace. If the definition of the secret isn’t part of the users GitOps flow this becomes more difficult as the first namespace has the secret (which was created by ASO) and that secret must be cloned manually to the second namespace, otherwise the creation of the resource in the second namespace will attempt to generate a new secret.
We could investigate allowing both automatically generated secrets (the default) and user specified secrets (opt-in). That avenue results in the operator shouldering all of the complexity burden of both though.
Given ASO’s approach as a low level toolkit providing the ability to create Azure resources, at least at this time we should avoid the added complexity of generating user secrets.
Prefer Managed Identity
Managing secrets is hard. We should prefer AAD Authentication and Managed Identity where possible.
What this means in practice is that secrets should not be retrieved unless the user has specifically asked for them to be.
This has a number of advantages:
- If the user is using Managed Identity, they don’t have a secret leak waiting to happen created in their namespace which they didn’t ask about (and might not even know about).
- We can make a failing
ListKeys
(or equivalent) call fatal. Many services support ways lock down or forbid access to theListKeys
(or equivalent) API. This allows users to block access if they would like to require AAD authentication with their resource. As long as the operator is not calling those APIs unless asked (in theSpec
of the resource) we can make their failure fatal. This gives users an easy way to drive towards full AAD usage in ASO and elsewhere.
We’ll dig more into how we might accomplish this a bit later.
Work well with GitOps
Resources should strive to play well with GitOps. This means that when a new resource is deployed it is capable of adopting an already existing ARM resource if one exists. This also must apply to secrets. As mentioned above one of the reasons that we do not want to be generating secrets on the users behalf is that it makes redeploy and resource adoption hard because the user has no configuration representing the generated secret.
Kinds of secrets
There are two main types of secrets we need to consider, differing primarily by the origin of the secret.
User provided secrets: Secrets provided by the user at resource creation time.
Azure generated secrets: Secrets created by Azure, and returned by a special GetMeTheSecrets
API call.
Sample resources that have secrets
Below is a table containing a sampling of resources with secrets that ASO already supports or has a plan to support in the near future.
CRD | User provided secrets | Azure generated secrets | AAD/Managed Identity Support | Notes |
---|---|---|---|---|
VirtualMachineScaleSet | ✔️ | ❌ | ❌ | Username and Password . Can be modified by subsequent PUT. |
VirtualMachine | ✔️ | ❌ | ❌ | Username and Password . Can be modified by subsequent PUT. |
PostgreSQL FlexibleServer | ✔️ | ❌ | ✔️ | AdministratorLogin and AdministratorLoginPassword . Must have even if using AAD. Can be modified by subsequent PUT. |
MySQL FlexibleServer | ✔️ | ❌ | ✔️ | AdministratorLogin and AdministratorLoginPassword . Must have even if using AAD. Can be modified by subsequent PUT. |
StorageAccount | ❌ | ✔️ | ✔️ | List Keys API and Regenerate Keys API. AAD+RBAC (blob/table only?) Authorizing Access with Active Directory. |
CosmosDB DatabaseAccount | ❌ | ✔️ | ✔️ | List Keys API, List Read Only Keys and Regenerate Key API. For AAD+RBAC (supported by SQL only?), see Disabling Local Auth, Create Role Assignment API, Create Role Definition API. Built-in Role Definitions. |
EventHubAuthorizationRules | ❌ | ✔️ | ❌ | List Keys API. There are default authorization rules created, such as RootManageSharedAccessKey . Supports regeneration. |
Redis | ❌ | ✔️ | ❌ | List Keys API. Regenerate Key API. |
Other kinds of secrets in Azure:
There are a few other types of secrets in Azure in addition to the two main ones discussed above.
- Get once Azure generated secrets: These are secrets created by Azure and only returned once, usually as the response to a POST.
These don’t fit cleanly into the table above because they are a POST action on a parent resource these are not in fact resources themselves.
- Application Insights Component APIKey: This is a POST to the Component/ApiKey URL.
- KeyVault Key: Create Key API.
- “Secrets” created by Azure, and returned in the GET: In almost all cases (all that I have seen), a secret in Azure is not returned by a GET.
“Secrets” returned in a GET are not really secrets per-se, but ASOv1 classifies them as secrets.
ApplicationInsights
InstrumentationKey
orConnectionString
? See here.
- Short-lived “tokens”
StorageAccount
SASCosmosDB
ResourceToken.
Other Operators
A quick look at what other operators are doing with regard to secrets.
Crossplane
Azure generated secrets: A Connection secret stores information needed to connect to the resource, including keys generated by Azure (see for example Storage Account). This is a standard pattern used across all of the providers.
The destination of the writeConnectionSecretToRef
is currently always a Kubernetes secret, but there is an open issue
requesting pluggable secret stores.
User provided secrets seem to be automatically generated by Crossplane.
AWS Controller for Kubernetes (ACK)
User provided secrets are provided via a SecretKeyRef.
This allows cross-namespace references to a secret. The specific Key
of the secret is selected with a Key string
parameter.
It looks like there may not currently be support for key updates/rollover.
AWS generated secrets I can’t find any examples of. They do not seem to classify “endpoints” as a secret, as shown by opensearchservice endpoint and cluster endpoint.
Proposal
User specified secrets
User specified secrets will be detected and transformed from string
to a SecretReference
:
type SecretReference struct {
// Name is the name of the secret. The secret must be in the same namespace as the resource.
Name string
// Key is the key in the secret to use.
Key string
}
Detection will be done with a combination of:
- Additions to ASO’s configuration file to flag particular fields as secret (probably in the
ObjectModelConfiguration
section). - Using the
"format": "password"
data from Swagger, such as that used by MySQL Flexible Server. Note that not all specs have this, for example VMSS does not. - Using the
x-ms-secret
annotation, which I assume has the same meaning as"format": "password"
although it’s not actually exactly documented anywhere.
This is a place where we can push changes upstream to flag things as passwords if they’re not being flagged.
Lifecycle
Since these secrets are created by the user, the user owns the lifecycle of these secrets. They could be using the same secret across many resources, or intending to use this secret on a resource they have not created yet. As such, the lifecycle of these secrets must be controlled by the user.
Rollover will be supported by triggering events on the associated resource when the secret is modified. Since multiple custom resources might be using the same secret, this could trigger reconciles on multiple resources. An existing pattern has been established for this in mysqlserver_controller.go of ASO v1.
When secrets are rolled over, there is a risk that applications using the secret will fail because the secret they are using is no longer valid. We don’t need to worry about coordination or timing though. If the user asks us to update the password (by changing it in the Kubernetes secret), we should just do it. We aren’t going to be able to guarantee 100% uptime unless the service in question supports multiple keys/passwords/secrets. If the service does support that, we should be able to do what we’ve designed here and we will automatically get those same 100% uptime guarantees for the services that support it.
Open questions
Should we support KeyVault inputs?
Kubernetes has a proposal out for unions. In this proposal they
suggest using a discriminator
field, but allow for either of the shapes proposed above.
On the discriminator, they say:
The value of the discriminator is going to be set automatically by the apiserver when a new field is changed in the union. It will be set to the value of the fields-to-discriminateBy for that specific field. When the value of the discriminator is explicitly changed by the client, it will be interpreted as an intention to clear all the other fields. See section below.
See more details about what they see the discriminator
doing in
normalizing on updates.
Since their proposal suggests that discriminator
is optional, and as far as I know it is not supported by kubebuilder
yet, I suggest we don’t add one
for now. In any case it’s not totally clear to me that we really need the value that it is adding, and it seems to add a significant amount of update complexity.
We need to decide which of the above shapes we like more, and also if we do or do not want a secretType
discriminator.
Conclusion: Not needed for v1 of the feature, but we need to have a shape ready that makes sense for when we do.
...
secret:
type: Secret # Or KeyVault
name: foo
key: bar
keyVaultReference: # This is the standard Reference type we use elsewhere
armId: ...
Is an AccountName
a secret?
If the service is returning it in the resource GET
, then strictly speaking it is not a secret. Since the field isn’t secret, we will not transform it
to be a SecretReference
. This is for two reasons:
- We won’t be able to automate it, since as far as the OpenAPI specification is concerned it isn’t a secret.
- The primary reason to classify this as a secret would be so that users could then inject the value from their secret into a pod. There is a workaround for this though since the user could (using Kustomize or similar) do this already.
Conclusion: We will not classify these as a secret for inputs unless there’s some requirement forcing us to do so.
Do we allow reading a secret from a namespace where the resource isn’t?
There were some requests for this in ASOv1, see this for example.
This has security implications, so initially at least the answer should be no. See: https://github.com/kubernetes/community/pull/5455.
Conclusion: No
What’s the difference between x-ms-secret
and format: password
?
Unclear currently. I have a question out to the Swagger team. Current plan is to just treat them the same.
Conclusion: They are not the same, x-ms-secret
implies that the secret is not returned in the GET
, whereas format: password
is just an annotation that
as far as we can tell is not actually consumed by anything. For our case, we can treat them the same.
Azure generated secrets
Azure generated secrets will be optionally downloaded to a Kubernetes or KeyVault secret. Users instruct the operator to download
the secrets associated with a resource by supplying a SecretDestination
in the Spec
.
The optionality of this step is key for preferring managed identity, as it allows those in AAD/Managed identity cases to avoid secrets they don’t want/won’t use being retrieved into their namespace.
// Note: This is the same type that is used for specifying references to user created secrets/
// reproduced here for clarity
type SecretReference struct {
Name string
Key string
}
// Using a different type for the destination as there are some things that are destination specific (such as adding annotations or labels,
// which don't make sense on )
type SecretDestination struct {
SecretReference
}
Example: Explicit secrets w/ Azure Storage (Simple)
spec:
# Other spec fields elided...
operatorSpec:
secrets:
primaryKey:
name: my-secret
key: PRIMARY_KEY
secondaryKey:
name: my-secret
key: SECONDARY_KEY
endpoint:
name: my-secret
key: ENDPOINT
Example: Explicit secrets w/ CosmosDB and multiple secret destinations (Complex)
Here is what a more complex resource might look like if we also supported KeyVault
as a secret type.
Note: We are not planning to support KeyVault
initially.
spec:
# Other spec fields elided...
operatorSpec:
secrets:
primaryKey:
type: KeyVault
reference:
armId: /subscriptions/.../resourceGroups/.../providers/Microsoft.KeyVault/vaults/asokeyvault
name: my-primary-key
secondaryKey:
type: KeyVault
reference:
armId: /subscriptions/.../resourceGroups/.../providers/Microsoft.KeyVault/vaults/asokeyvault
name: my-secondary-key
readOnlyPrimaryKey:
type: Secret
name: my-readonly-secret
key: PRIMARY_KEY
readOnlySecondaryKey:
type: Secret
name: my-readonly-secret
key: SECONDARY_KEY
endpoint:
type: Secret
name: my-secret
key: ENDPOINT
Note that some resources (like CosmosDB DatabaseAccount
) have multiple kinds of secrets. There might be a PrimaryKey
, SecondaryKey
, ReadOnlyPrimaryKey
,
and ReadOnlySecondaryKey
. At the very least we need to support putting the main keys and the readonly keys into two different secrets. To accomplish this,
we can define multiple logical secret groupings in the ASO config and a corresponding destination
property will be created for each of them. You can see this
done above in the sample.
Some fields such as Endpoint
(or AccountName
for other databases) are not really secret, but should be included in these logical secret groups anyway so
that they’re easier for users to inject into pods.
Azure generated secrets are more difficult to automatically detect. Often there is a ListKeys
or GetKey
API for the resource in question, but nothing on
the resource itself indicates that it has secrets automatically generated by Azure. For resources like these we will add a flag in the ASO configuration to
generate the appropriate structures in the resource. In the future we can investigate detecting this from the Swagger (probably after we move to Swagger
as the single source of truth) by introspecting “other API calls” on the resource in question.
Hooks required
In addition to the configuration required to generate types with the right shape, we will also need a way to hook into the reconcile process and actually make
the right ListKeys
or GetKeys
call. To support rollover we would need a different hook as well.
This is a relatively involved topic so not designing it all here. As a starting point, resources manually implementing the following interface would get us what we need. Issue #1978 is tracking this request in more detail.
See also the design of reconciler extensions.
type ARMDetails struct {
Endpoint string
SubscriptionID string
Creds azcore.TokenCredential
HttpClient *http.Client
}
type ReconcileDetails struct {
log logr.Logger
recorder record.EventRecorder
ARMDetails *ARMDetails
KubeClient *kubeclient.Client
ResourceResolver *genruntime.Resolver
PositiveConditions *conditions.PositiveConditionBuilder
}
type BeforeReconcileOverride interface {
// BeforeCreateOrUpdate runs before sending the resource to Azure
BeforeCreateOrUpdate(ctx context.Context, details *ReconcileDetails) (ctrl.Result, error)
}
type AfterReconcileOverride interface {
// AfterCreateOrUpdate runs after the resource has been sent to Azure and finished creating, either successfully or with a fatal error.
// You can determine the current state by examining the Ready condition on obj.
AfterCreateOrUpdate(ctx context.Context, details *ReconcileDetails) (ctrl.Result, error)
}
Resources with Azure generated secrets would manually implement AfterCreateOrUpdate
, which would then:
- Check that the resource is in a
Ready
state. - Cast the provided
obj
to the expected type. This should be guaranteed safe because the method won’t have been called unless its implemented. - Check the
forOperator
section of the spec to determine if any keys should be written. This includes determining the destination secret name and any annotations or other properties that should be written or updated on that secret. Also includes ensuring that the secret in question (if it exists) is owned by the resource in question. - Make the required
GetKeys
orListKeys
call to Azure. - Create or update the secret.
Lifecycle
These are secrets created by the operator (usually after calling some ListKeys
type API). Since the these secrets are by definition specific to the resource
that created them, their ownership in Kubernetes will be set to the resource that created them. When the owning resource is deleted, the created secret will
also be deleted.
Open questions
Should Endpoint
/AccountName
type “secrets” really be put into the secret, or no?
Putting them into the secret makes injecting them into pods easier, which is what people are going to want to do with these values. ASOv1 classifies all of these sorts of things as secrets.
I think we should put these into the generated secrets… although this does somewhat conflict with my stance on user specified accountName
, etc where
I had said to not classify them as secrets.
Conclusion: Yes, we will put them into the secret.
How does key rollover work for these types of secrets?
We don’t support this at all initially. This is a somewhat advanced scenario that doesn’t fit well into the Kubernetes resource model anyway
because rollover is more of an action and less of an actual resource. The user can roll their secrets using the az cli
(or other tooling) and
then either wait for a reconcile to occur naturally or force one to refresh the secrets locally in the cluster.
When we do decide to support this, we should be able to do so as a Job
-esque resource that runs to completion (and somehow triggers a re-reconcile
on the parent resource type).
Conclusion: This will not be supported in the initial implementation.
How do deal with soft-delete and purge protection?
If we support KeyVault, we have to deal with the soft-delete + purge awkwardness that deleting and recreating brings. Some of this we dodge by giving control
to the user and expecting them to provide us with the name of a secret that’s going to work. We may also need a flag per-secret (or global at the operator
level?) for if we should purge the secret when we delete. Something like purgeOnDelete
.
The main downside of a global flag controlling the behavior of purge or delete is that it doesn’t give fine-grain control, yet we are proposing to give fine-grained control over what vault to put secrets in. This seems like a discrepancy. This seems to suggest we just put this option on every KeyVault reference:
spec:
# Other spec fields elided...
operatorSpec:
secrets:
# Save the read-write keys (and endpoints) into a kubernetes secret called "my-secret" and a KeyVault secret called "my-secret".
keyDestination:
keyVault:
reference:
armId: /subscriptions/.../resourceGroups/.../providers/Microsoft.KeyVault/vaults/asokeyvault
name: my-secret
purge: false # Optional: defaults to false
delete: false # Optional: defaults to true
Conclusion: When we add KeyVault support, we will add the option to avoid deletion/purging of secrets.
How do we ensure that we aren’t overwriting secrets that we don’t own?
The operator must support updating the secret as keys can change or be rolled over. On the other hand we should present a clear error if the user has accidentally pointed two resources at the same secret. The first resource should be unaffected and work normally while the second resource should encounter a reconcile error stating that the secret in question already exists.
This applies to either Kubernetes or KeyVault secrets.
Proposed solution for Kubernetes secrets is to issue a GET
and check the Owner
field for Kubernetes secrets before issuing an update. resourceVersion
should ensure that we’re protected from any races where some external entity deletes the secret and another resource creates it between our GET
and PUT
.
Proposed solution for KeyVault secrets is to label the secret with a key uniquely identifying the operator and resource (GVK + namespace + name) which it
corresponds to. The operator will then issue a GET
prior to attempting to update the secret to ensure that it owns the secret. Unfortunately, as far as I can
tell KeyVault secrets don’t support Etag
so there’s a possible data race here that we can’t avoid…
Conclusion: We will use the above design to ensure we’re not overwriting secrets.
What happens when a resource is updated to remove the secrets
entry?
If a resource is created and given a forOperator.secrets
field that instructs it to create a secret, and then is updated to remove this secrets entry,
should we delete the secret we previously created? It feels like we should, but when the operator is presented with the updated object during that second
reconcile it doesn’t know that it ever had a secret.
Some possible solutions:
- We just don’t delete the secrets when you do this and expect users to do it themselves if they want things cleaned up.
- For k8s secrets, we could probably look through all of the secrets in the resources namespace and see if any of them are owned by us. If there are some
and we don’t have any
forOperator.secrets
we delete them. The downside here is that’s a lot of overhead to do for all reconciles. This also assumes that the operator has SecretRead
permissions in the namespaces in question - which it’s possible that it doesn’t (it may never have created any secrets and the users may know that it won’t and so have denied it permission). This also obviously doesn’t work for KeyVault secrets as we have no idea which KeyVault the secrets might even be in. - When we create a secret we store information about it in
Status
orAnnotations
and then use that information to clean up after ourselves. It’s a bit cleaner to store it inStatus
but technically that can be lost, whereasAnnotations
won’t be. This should work for both KeyVault and k8s secrets and while it’s a bit icky I think gives the best experience.
Conclusion: We will do nothing, the secret will remain. If this becomes a problem for users we can add support in the future for an annotation on the resource
that says azureGeneratedSecretBehavior: DeleteIfNotSpecified. We feel it’s unlikely people are going to complain though, and it errs on the side of caution for deleting
secrets that may be critical to the users application operation. Note that secrets WILL be deleted when you delete the actual resource (ex: CosmosDB
).
Should we drop the Destination
suffix in the field names?
For example, the proposal for CosmosDB DatabaseAccount
was:
spec:
# Other spec fields elided...
forOperator:
operatorSpec:
keyDestination:
# stuff elided...
readOnlyKeyDestination:
# stuff elided...
Without the Destination
suffix that would be:
spec:
# Other spec fields elided...
operatorSpec:
secrets:
key:
# stuff elided...
readOnlyKey:
# stuff elided...
Conclusion: Yes, drop it
Do we allow writing a secret to a namespace where the resource isn’t?
Conclusion: No, see: https://github.com/kubernetes/community/pull/5455
Do we support writing the same secret to multiple destinations?
Initially we had decided to not support this, and instead just write each secret to a maximum of 1 destination. This has one unfortunate side effect. It’s tricky for customers to break apart different “parts” of the secret, such as the primary and secondary keys, into different secrets as the endpoint can only be written to one of them.
There is a workaround for this though: when users want to split their primary/secondary keys into different secrets, they can write the endpoint to its own secret as well:
spec:
# Other spec fields elided...
operatorSpec:
primaryKey:
type: Secret
name: my-secret
key: PRIMARY_KEY
secondaryKey:
type: Secret
name: my-secret
key: SECONDARY_KEY
readOnlyPrimaryKey:
type: Secret
name: my-readonly-secret
key: PRIMARY_KEY
readOnlySecondaryKey:
type: Secret
name: my-readonly-secret
key: SECONDARY_KEY
endpoint:
type: Secret
name: my-endpoint
key: ENDPOINT
Then if they want to use the secondary key for a pod they would mount the endpoint and the primary/secondary keys from
my-readonly-secret
.
While this isn’t quite as clean as it could otherwise be by supporting multiple secret destinations for each
secret kind (endpoint
, readOnlyPrimaryKey
, etc) it’s simpler to implement and has fewer failure modes as we don’t
have to deal with the possibility of the user specifying 10 destinations where we wrote some but failed to write others.
Conclusion: No, we will not support this.
What client should we use for issuing the ListKeys/GetKeys request?
We don’t automatically generate methods for performing these calls. It seems easiest to just use the corresponding Azure SDK for this. This does mean that we’re taking a dependency on the SDK where we didn’t have one before, but the overall footprint of usage is pretty small and it’s going to be easier to do that than it is to write the methods ourselves.
Conclusion: We will use the Azure SDK for this.
What API version should we use for issuing the ListKeys/GetKeys request?
There are a few options here. The custom hooks could be applied to either the customer facing resource version or to the internal storage version. Applying the hooks to the storage version is easier, but means that the same hook will be run regardless of which customer
Other kinds of secrets
See other kinds of secrets in Azure for examples of each of these types of secrets.
Get once Azure generated secrets: These are secrets created by Azure and only returned once, usually as the response to a POST.
This pattern doesn’t seem to be very common. The current plan is to not support resources with this pattern. If we did need to support resources with this pattern there would be no way to avoid violating the work well with GitOps goal, as when deploying this resource there would be no way to adopt secrets associated with it (they’re GET-once). That might be ok for particular kinds of resources provided we document it though.
If there are large user requests for this sort of secret management we can support it similar to the ListKeys
cases except that
the resources won’t be movable without also cloning the secret.
“Secrets” created by Azure, and returned in the GET
: This is things like InstrumentationKey
, or server endpoint URLs.
Since these are returned in a GET
they are not secret and are already being shown on Status. We will also manually classify some of them to be
included in secrets we write. That will include things like endpoint
. This means that endpoint
for a SQL Server would show up in two places,
in the status and possibly in the secret written by the operator (assuming that the user has instructed us to write the endpoint someplace).
The main reason for writing this into two places is:
- We already have it in the
Status
and it doesn’t make sense to remove it as it’s part of the resource payload from Azure. - We want the ability to put it into a Secret so that if users want to inject it into their pods as an environment variable it’s easy to do so. There is no good way to inject environment variables from CRD Status’s.
Short-lived “tokens”: Like Storage SAS.
We will not support these sorts of secrets.
A special note on KeyVault
We have an open issue asking for support to manage KeyVault secrets via ASO.
We have to be very careful about support for KeyVault secrets (or certs, keys, etc), as the whole point of KeyVault is as a secure place to store your secrets. If you’re instead creating those secrets via the operator then you have by definition also located the secret in Kubernetes, which somewhat defeats the purpose from a security perspective.
There might be cases where this makes sense if there are APIs that require a KeyVault secret to be provided to them via ARM ID and so KeyVault isn’t the medium of secure storage so much as it is a medium of secret transfer.
Until we run into such scenarios we should avoid implementing any of the KeyVault key/secret/certificate/etc APIs.
Integrations
If/when we support storing secrets in KeyVault, we can create some demos showing integration with the KeyVault secret store csi driver.
Supported secret stores
P0: Kubernetes Secrets
P1: KeyVault - this seems more interesting for the “secrets from Azure” case, since at least right now you cannot create KeyVault secrets through ASO. That limitation means if you wanted to use KeyVault for input secrets you must have already pre-created the secrets before deploying via the operator.
Implementation plan
There are effectively two parallel features here:
- Input secrets (reading secrets from a store and supplying those secrets to Azure)
- Output secrets (reading secrets from Azure and storing them in a secret store)
Input secrets
SecretReference
implementation.- Code generator changes to detect
format: password
orx-ms-secret
and transform properties appropriately. - Reflector library to generically crawl resources and find secret refs.
SecretReader
library to get secrets from a collection ofSecretReference
’s. This should be expandable to support KeyVault secret refs in the future if we decide to add them. Should probably look something like this. Note that this related to theSecretWriter
below in the output secrets section:// TODO: these strings may need to be []byte // Note: these interfaces are acting on collections only so that they can be more efficient, as multiple SecretReference's or secret values may be read from or written to // a single secret. type SecretReader interface { GetSecrets(ctx context.Context, refs []SecretReference) (map[SecretReference]string, error) }
- ARM conversion changes to take a
ResolvedSecrets
parameter in addition to theResolvedReferences
it takes now. This will probably require a bit of fussing with the types passed to the ARM conversion methods and the conversion methods themselves. - Reconciler changes to perform secret reading in addition to reference resolving.
- Testing: Modify existing tests passing secrets in plain text as part of the spec to use the new mechanism instead.
- [Stretch goal] Support for secret rollover:
- Hook to add field indexers for specific resources (see what we’re doing for MySQL in ASOv1 today).
- Hook to control custom additional watches used for monitoring changes to secrets (see what we’re doing for MySQL in ASOv1 today).
Output secrets
operatorSpec
/operatorStatus
preparatory work. See #1612.- Add hooks to controller allowing handcrafted per-resource customization, see #1978.
- Update azure-arm configuration to allow for additional AzureGeneratedSecret properties to be defined. These properties will be rendered into the
operatorSpec
asSecretDestination
’s and subsequently read by the resource specific hooks in order to determine what (if any)ListKeys
APIs to call and where to store the results. - Implement the
SecretWriter
interface described below (or something like it). Note that this is related to theSecretReader
above in the input secrets section.// TODO: these strings may need to be []byte // Note: these interfaces are acting on collections only so that they can be more efficient, as multiple SecretReference's or secret values may be read from or written to // a single secret. type DestinationValuePair struct { Value string Destination SecretDestination } type SecretWriter interface { // This will perform ownership checks and return an error if an attempt is made to update a secret that is not owned by the operator SetSecrets(ctx context.Context, owner MetaObject, secrets []DestinationValuePair) error }
- Use customization hooks to implement
GetKeys
orListKeys
for each applicable resource, utilizing theSecretWriter
to write the secrets to their destination.
Testing
Unit testing
All of the library code should have unit tests written:
- Reflector to discover
SecretReference
. SecretReader
/SecretWriter
End to end testing
The existing EnvTest tests can be expanded to test secret management. We will need to make sure that we redact the keys returned by ListKeys
. Tests for resources which
take secrets will need the secrets created beforehand and passed to the resource (just as the customer would have to do).
Related issues
- Secrets created by ASO should have configurable annotations
- Path to secrets created by ASO should be in the status - maybe not needed if we’re making the user tell us where to put it?
- Secrets deletion should be required as part of resource deletion
- Add option to create and manage keyvault secrets through ASO