An exploration of the past, present, future for Azure Kubernetes Service (AKS).

Posted by Jorge Palma   May 21, 2024     10 minute read

Introduction

Hi! My name is Jorge Palma, I’m a PM Lead for AKS and I’m excited to be inaugurating our new AKS Engineering Blog. In this new blog, we will complement and extend some of our existing channels, providing extra context to announcements, sharing product tips & tricks that may not fit in our core documentation, and giving you a peak behind the curtain of how we build the product.

In this initial post, taking inspiration from Mark Russinovich who has named so many similar series and talks like this, I hope to take you on a (shortened) journey through the history of Azure Kubernetes Service. We will talk about the past, how we started and how we got here, where we are today and also some thoughts about what the future holds for AKS.

Past - How we got here

An AKS history recap should start a little before it actually came to existence. Circa 2015, containerization was starting to bubble up; containers were becoming not only a developer tool or software packaging mechanism, but brewing to be much more, a whole new way to build and deliver software. But before this was possible a key piece was still required; how could we go from delivering software to running services at scale? This meant addressing the requirements to run containers in production, and more than that to run full services, platforms and applications in production. These requirements could range from container placement and scheduling, guaranteeing its health and execution, to facilitating communication between different containers that could represent different parts of the application/service or with external services like a PaaS database for example. These and many more tasks became the purview of the emerging “Container Orchestrators” at the time. However, bootstrapping and configuring your container orchestrator was not a simple task; it could involve cumbersome tasks from setting up infrastructure to configuring dozens of components within it to create your “cluster”, the set of hardware or virtualized infrastructure that would host your containers. To assist our users the Azure team at Microsoft decided to create, first a tool/project – ACS-engine, and then a service based on it, Azure Container Service (ACS), whose main mission was to help users quickly bootstrap a cluster of some of the most popular container orchestrators at the time. One of those orchestrator options was Kubernetes, which went to General Availability (GA) in ACS in February of 2017.

It would be fair to say that the mission of providing a fully configured container orchestrator was in fact achieved by that service, but the journey was only beginning. On one hand, the user community and market at large clearly self-elected Kubernetes as the standard container orchestrator, due to its extensible and pluggable architecture that made a whole ecosystem surge alongside it as well as its enterprise backing (such as Microsoft) that made strides alongside other key contributors to ensure key enterprise requirements were baked in or facilitated from early on. On the other hand, getting a running Kubernetes cluster that would orchestrate your containers turned out to be only the beginning of user’s needs with respect to container orchestrators. As tools continued to advance and improve their UX, creating clusters by yourself also became achievable to most people, however you’d still need to maintain and operate those clusters, often having to understand a lot of its internal components, behaviors and inner workings as well as ensuring the resiliency and business continuity of the control plane and system components that sustained the cluster to run the user applications.

So, it became clear that in order to better serve our users, our mission and direction had to evolve and focus. We needed a fully managed Kubernetes service that took care of bootstrapping, control plane and system component management, and simplified day-2 operations and developer experiences. Azure Kubernetes Service (AKS) was born as a result in October of 2017: a fully managed service where the control plane was fully hosted by Microsoft and the nodes ran in the user’s subscription. This enabled users to get compute from their existing quota, benefit from infrastructure reservations or saving plans, and integrate with existing infrastructure like hub and spoke architectures, while still being fully serviced by Microsoft. This allowed enterprises to meet their most complex requirements and transpose their architectures to AKS while benefitting from a modern cloud native platform.

At that time, the service was still very basic compared to what it is today. It only supported public clusters with only one availability set, no nodepools or autoscaling, no Windows containers, and a maximum of 100 nodes per cluster. It also lacked many features that are now considered essential, such as role-based access control, advanced networking, node repair, and cluster upgrades. Despite these limitations, AKS was still recognized as a huge leap forward from ACS and customer adoption was really fast. Users saw the value of having a fully managed Kubernetes service that integrated seamlessly with Azure and Microsoft tools. Plus, we quickly got to work delivering many of the key capabilities that our customers depend on today.

Present – Where we are today

Over the last few years AKS has been largely focused on democratizing and commoditizing access to Kubernetes and its usage by ensuring the service was fully managed, supported and serviced, not only the control plane and its nodes but also all system components such as CoreDNS or metrics server. We set out on a mission to make AKS a comprehensive application platform, delivering on all the pluggable points that Kubernetes supported and integrating with the best of Azure and Microsoft. Integrations ranging from Monitoring, Entra ID, Azure Storage, Networking and other fundamental infrastructure pieces to higher level integrations with services like AI Service, Load Testing, Data Services, GitHub, Azure DevOps, and much more helped ensure users were able to meet their use cases and requirements.

As part of this mission, it became imperative to ensure that AKS was a mission-critical platform from a reliability, security and scalability perspective, where users could run any kind of workload across any Azure region and achieve the highest levels of availability for their workloads. The service benefited from long-standing partnerships with some of our early adopters like Mercedes, Walmart, ESRI, Adobe, Starbucks and H&M who partnered closely with us from the early days and helped us build AKS to where it now serves more than 94% of Fortune 500 companies running over 4 million mission critical applications in the platform at scale and in fully managed fashion. Today, we continue to partner with more and more amazing users like OpenAI, GitHub, ClickHouse, GoPuff, InMobi and Microsoft Teams to continue to further the value we can provide: delivering on simplified operational excellence capabilities like auto-upgrade, managed service mesh, app-specific scaling with KEDA or breaking change detection as well as multi-tiered security controls like Azure RBAC, Workload Identity or Azure Key Vault as secrets store, and enabling users to achieve the highest levels of compliance requirements such as FEDRAMP or PCI-DSS.

We stay true to our origins and dedication to Open-Source Software (OSS) by developing all our solutions and activities either as OSS or on top of vendor neutral software, collaborating closely with the governing OSS foundations and other vendors. This approach ensures we offer standardized options that serve the wider community and provide users with numerous advantages, supporting industry-driven advances. However, for users to truly excel and maximize their success, Azure needs to fully become a Kubernetes-powered cloud, blending the benefits that cloud and Microsoft can bring with the robust innovation and collaboration found in the Open-Source space. To achieve this, when AKS implements these OSS-based solutions, it also offered comprehensive, end-to-end support for them with deep integration into the Microsoft ecosystem. Examples include Azure Policy with Gatekeeper/OPA, AKS Backup with Velero, Azure Container Native Storage with OpenEBS, Image Cleaner with Eraser, Azure Monitor with Prometheus and Grafana, Network Observability with Retina, Troubleshooting with Inspektor Gadget, and Image Integrity with cosign or notation, among others.

AKS users rely on OSS solutions as a requirement for all their existing and new projects, ensuring cross environment compatibility, reuse of skills and standardization of technology stacks and solutions. As such, they can continue to demand nothing less than this from AKS, maintaining all the benefits that Kubernetes brings, its standardization and flexibility options with the addition of the ease of use, support and scale that cloud provides. AKS will remain committed to this path.

One of the key challenges with Kubernetes today for some Enterprise customers is its rather fast pace of change and evolution. With that AKS has launched industry leading Long-Term-Support (LTS) for those customers struggling to keep up but still wanting all the same management and support that AKS delivers while at the same time the service also added numerous controls to allow companies to customize their upgrade experience, facilitate getting into current versions and control their workloads.

The mass adoption of Kubernetes resulted in it becoming the de facto cloud native platform. Therefore, it received a new wave of adoption which continues to bring new requirements for the platform to meet. In addition, the experience of using Kubernetes needs to be improved iteratively to make users more productive, assist with onboarding and ease the continuous skilling of users with the technology. So once again, this might require an update to AKS’ mission.

Future – Where we’re headed

Future-predicting exercises are always challenging at best but with the level of uncertainty this always carries there are a few trends that can be spotted with higher confidence.

The mission for AKS remains largely unchanged but with bolder ambitions; the focus is shifting from the service itself to the users’ workloads and next generation needs. The aim needs to continue to be empowering users to defer even more and focus on what brings real value to them, for developers to be more productive and fulfilled with what they achieve and to shorten the time to value for enterprises. For this AKS will seek to assist more and more to ensure the users Workload SLOs, becoming more Automatic and delivering capabilities like Auto-Instrumentation of applications, intelligent workload placement for optimized cost and availability, safeguarding environments in support of platform teams and fine tuning the performance to match any service needs.

As users continue to expand on their usage and footprints, the objective moves from how simple it is to have a reliable application in one cluster to how easy it is to deploy and manage that application across hundreds of clusters throughout the world and achieve this not only in the most effective way but also in the most cost efficient. This aligns with the platform engineering trend we’ve seen over the last couple of years that enables developers to leverage Kubernetes as a self-service platform for building, testing, and deploying applications at scale.

Kubernetes serves as the engine behind the AI revolution, and AKS aims to enrich this for users in two key forms. Firstly, by guaranteeing that every single user and company can utilize a Copilot in Azure to enhance their work with AKS, alongside receiving instant support on any issue, speeding up the process of learning and improving efficiency. Secondly, by striving to continually be the optimal and most comprehensive platform for modern applications and ML platforms, making it simple for our users to integrate AI and operate intelligent workloads with examples like the Kubernetes AI Toolchain Operator (KAITO).

Conclusion

We hope you enjoyed reading this inaugural post and found it helpful to understand the vision and direction behind AKS. We are excited to share more details and updates on our journey to make AKS the best platform for your workloads. Stay tuned for future posts where we will dive deeper into specific features and scenarios, and make sure you’re subscribed to the AKS Community where you can see all this content and more in video, as well as recaps from our events. Thank you for your attention and feedback, and happy kubing!

Updated: