Security Best Practices

Securing ASO in your cluster

ASO has 3 levers that allow you to manage access to your Azure resources:

Controlling which CRDs are installed.
Controlling the Azure identities used by ASO at each scope, including the Azure RBAC permissions assigned to those identities.
Controlling the Kubernetes identities that use the cluster and their Kubernetes RBAC permissions.

We recommend making use of all 3 of these levers to fully secure a cluster running ASO.

Dos and Don’ts

✅ DO adopt this pattern.

⛔ DO NOT adopt this pattern.

General guidance

✅ DO use Azure Workload Identity for all credentials. Other supported identity types are called out in the authentication documentation.

✅ DO use namespace-scoped ASO credentials, rather than global scope. Note that the global scope credential is optional and may be omitted when installing ASO.

✅ DO follow the principle of least privilege when assigning roles to identities which will be used by ASO. Remember, users with access to the namespace the ASO credential is in can do everything that credential can do via ASO. This means that if users in namespace a are supposed to have broad permissions only to resources in resourceGroup a, then the ASO identity for namespace a should have Contributor only on resourceGroup a and not the whole subscription. See reducing access for more details on managing Azure access.

✅ DO restrict access to sensitive namespaces in the cluster. On AKS, you can use a combination of AAD (now Entra) integration, disabling local users, and defining JIT/Conditional access policies. We strongly recommend setting up conditional access policies for sensitive namespaces such as production. Doubly so if the ASO credential for that namespace has broad scope.

✅ DO use tools like ArgoCD or Flux to perform code review of changes to ASO CRs before allowing them to be merged and applied.

✅ DO only install the ASO CRDs you need, no more.

⛔ DO NOT install the RoleAssignment CRD if you don’t need it. This CRD can enable escalation of privilege if not used carefully. If using the RoleAssignment, follow the other DOs in this guide to do it safely.

An Example Setup

A possible setup might be a dev namespace set up as a development environment pointing at a development subscription, and a prod namespace set up as a production env. The dev namespace might point to a development subscription and the prod namespace to a production subscription.

dev namespace has Azure credentials which are contributor on the dev subscription, same for prod for the prod sub. Developers in dev are members of an Azure AD group with roles that give access to CRUD ASO CRDs and other Kubernetes resources (Pods, etc) in the dev namespace, but not the production namespace.

This means that developers can do basically whatever they want in the dev namespace, including assign roles to themselves at the Azure level in the dev subscription.

prod namespace also has an Azure AD group with roles that give access to CRUD ASO CRDs and other Kubernetes resources, but that group is by default empty. Users can use JIT/Conditional access policies to escalate into that group. This means that by default, nobody can do anything in the prod namespace to either the Kubernetes resources (Pods, etc) or the Azure resources via ASO or the portal.

When a user needs ad-hoc access to the prod namespace they can go through the JIT process to get access to the prod namespace. Standard deployments to prod should be done through a CI/CD tool like Argo or Flux. This has the advantage of ensuring that proposed changes to prod need to first meet the merge bar (pass through code review and other processes) to make it into the repo before Argo/Flux will deploy them to prod. The conditional access/JIT is a break-glass used rarely.

Note that the above is just one way to lay things out. The same ideas can be applied to a dev and prod resource group within a single sub and also to other more complex topologies. test or int can be added in the middle with a more locked down set of rules than dev but less locked down than prod (or maybe test and prod have very similar lockdowns to force the same procedures across both).