This is the multi-page printable view of this section. Click here to print.
Documentation
1 - Overview
Azure Kubernetes Service(AKS) is a managed Kubernetes platform on Azure. It provides various features that makes it easy to get up and running on production grade Kubernetes Clusters. For more information about AKS, check out Introduction to Azure Kubernetes Service.
This project provides various deployment options for running and testing Kubeflow running on AKS. To get started, check out our Deployment Options page.
2 - Contribution Guidelines
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
Trademarks
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.
3 - Deployment Options
Start by checking out the prerequisites page
If you want to deploy Kubeflow with minimal changes on AKS, then consider the vanilla deployment option. The Kubeflow control plane is installed on Azure Kubernetes Service (AKS), which is a managed container service used to run and scale Kubernetes applications in the cloud.
For a more secure deployment option that is has minimum baseline security, then consider the deploy with custom password and TLS deployment option.
3.1 - Deploy Kubeflow with Password, Ingress and TLS
Background
In this lab you will deploy an Azure Kubernetes Service (AKS) cluster and other Azure services (Container Registry, Managed Identity, Key Vault) with Azure CLI and Bicep. You will then install Kubeflow after creating custom Password. This deployment option will also make use of TLS with a self-signed certificate and an ingress controller. Swap out this self signed certificate with your own CA certs for production workloads.
Deploy Kubeflow with Password, Ingress and TLS
User Access Admin
and Contributor
or Owner
access to the subscription you are deploying into.
Use the Azure CLI and Bicep templates to deploy the infrastructure for your application. We will be using the AKS construction project to rapidly deploy the required Azure resources. The project allows users the flexibility to tweak their AKS environment however they want. Please check out the AKS construction helper for more details about AKS construction.
You can also try out the automated option using Mage build tool at the Azure Open Source Labs.
Login to the Azure CLI.
az login
az account set --subscription <NAME_OR_ID_OF_SUBSCRIPTION>
.
Install kubectl using the Azure CLI, if required.
az aks install-cli
Clone this repo which includes the Azure/AKS-Construction and kubeflow/manifests repos as Git Submodules
git clone --recurse-submodules https://github.com/Azure/kubeflow-aks.git
Change directory into the newly cloned directory
cd kubeflow-aks
Deployment steps
Get the signed in user id so that you can get admin access to the cluster you create
SIGNEDINUSER=$(az ad signed-in-user show --query id --out tsv)
RGNAME=kubeflow
Create deployment
az group create -n $RGNAME -l eastus
DEP=$(az deployment group create -g $RGNAME --parameters signedinuser=$SIGNEDINUSER -f main.bicep -o json)
echo $DEP > test.json
and restore it by running export DEP=$(cat test.json)
.
KVNAME=$(echo $DEP | jq -r '.properties.outputs.kvAppName.value')
AKSCLUSTER=$(echo $DEP | jq -r '.properties.outputs.aksClusterName.value')
TENANTID=$(az account show --query tenantId -o tsv)
ACRNAME=$(az acr list -g $RGNAME --query "[0].name" -o tsv)
Install kubelogin and log into the cluster
Next install kubelogin using the installation instructions appropriate for your computer. From there, you’ll need to run the following commands to download the kubeconfig file and convert it for use with kubelogin.
az aks get-credentials --resource-group $RGNAME \
--name $AKSCLUSTER
kubelogin convert-kubeconfig -l azurecli
Log in to the cluster. Enter your Azure credentials when prompted afterwards to complete the login. If this is successful, kubectl should return a list of nodes.
kubectl get nodes
Install kustomize
Next install kustomize using the installation instructions appropriate for your computer.
kustomize
command below to deploy Kubeflow, you must use Kustomize v3.2.0. More info here.
Install Kubeflow with TLS and Unique Password
Please note that a self-signed certificate is used for demonstration purposes. Do not use self signed certs for production workloads. You can easily swap this self-signed cert with your CA certificate for your usecase.
-
The first step is to generate a new Hash/Password combination using bycrypt. There are many ways of doing this, eg by generating it using python. For simplicity we will be using coderstool’s Bycrypt Hash Generator for testing purposes. Do not do this for production workloads. In the plain text field, enter a password for your first user, then click on the “Generate Hash” button. You can generate multiple if you have multiple users.
-
Head to the deployments/tls/dex-config-map.yaml file and update the hash value there (around line 22) with the hash you just generated. You can also change the email address, username and userid. In addition, you can setup multiple users by adding more users to the array. Please update the default email address in the params file located at manifests\common\user-namespace\base\params.env file if changed from default.
-
Update your auth.md file with the new email address and password (plain text password not hash) or store the secrets in a more secure way
-
Copy the contents of this newly updated manifests folder to the kubeflow manifests folder. This will update the files so the deployment includes your config changes.
cp -a deployments/tls manifests/tls
-
cd to the manifests folder and install kubeflow
cd manifests
Install all of the components via a single command
while ! kustomize build tls | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done
-
Once the command has completed, check the pods are ready
kubectl get pods -n cert-manager kubectl get pods -n istio-system kubectl get pods -n auth kubectl get pods -n knative-eventing kubectl get pods -n knative-serving kubectl get pods -n kubeflow kubectl get pods -n kubeflow-user-example-com
-
Restart dex to ensure dex is using the updated password
⚠️ Warning: It is important that you restart the dex pod by running the command below. If you don’t any previous password (including the default password 12341234 if not changed) will be used from the time the Service is exposed viaLoadBalancer
until the time this command is run or the dex is otherwise restarted.kubectl rollout restart deployment dex -n auth
-
Configure TLS. Start by getting IP address of istio gateway
kubectl -n istio-system get service istio-ingressgateway --output jsonpath={.status.loadBalancer.ingress[0].ip}
Replace the IP address in the deployments/tls/certificate.yaml file (line 13) with the IP address of the istio gateway and save the file.
-
Please note that instead of providing the IP address like we did above, you could give the LoadBalancer an Azure sub-domain (via the annotation in manifests/common/istio-1-16/istio-install/base/patches/service.yaml ) and use that too. Deploy the certificate manifest file.
kubectl apply -f tls-manifest/certificate.yaml
-
You have completed the deployment. Access the dashboard by entering the IP address in a browser. You might get a warning saying the connection is unsafe. This is expected since you are using a self signed certificate. Click on advanced and proceed to the URL to view your dashboard. Log in using the email address and password in the auth.md file (assuming you updated it with your email address and password in the previous step)
Testing the deployment with a Notebook server
You can test that the deployments worked by creating a new Notebook server using the GUI.
- Click on “Create a new Notebook server”
- Click on “+ New Notebook” in the top right corner of the resulting page
- Enter a name for the server
- Leave the “jupyterlab” option selected
- Feel free to pick one of the images available, in this case we choose the default
- Set Requested CPU to 0.5 and requested memory in Gi to 1
- Under Data Volumes click on “+ Add new volume”
- Expand the resulting section
- Set the name to datavol-1. The default name provided would not work because it has characters that are not allowed
- Set the size in Gi to 1
- Uncheck “Use default class”
- Choose a class from the provided options. In this case I will choose “azurefile-premium”
- Choose ReadWriteMany as the Access mode. Your data volume config should look like the picture below
- Click on “Launch” at the bottom of the page. A successful deployment should have a green checkmark under status, after 1-2 minutes.
- Click on “Connect” to access your jupyter lab
- Under Notebook, click on Python 3 to access your jupyter notebook and start coding
Destroy the resources
Run the command below to destroy the resources you just created after you are done testing
az group delete -n $RGNAME
3.2 - Prerequisties
Kubeflow on AKS Prerequisites
For all Kubeflow on AKS deployment options, you will need the following
- An Azure Subscription (e.g. Free or Student account)
⚠️ Warning: In order to complete the deployments, you will need to have either
User Access Admin
andContributor
orOwner
access to the subscription you are deploying into. - The Azure CLI
- Bash shell (e.g. macOS, Linux, Windows Subsystem for Linux (WSL), Multipass, Azure Cloud Shell, GitHub Codespaces, devcontainers, etc). This repository comes with a .devcontainer folder that allows you to configure your Codespaces or devcontainers environment so that it has all the required Bash tools like kubelogin and the correct version of kustomize
- The following installed in your Bash shell if you are not going with the codespaces or devcontainers option
- Kustomize v3.2.0
- Kubelogin
- git
- Bicep
- Kubectl
- sed (optional)
3.3 - Vanilla Installation
Background
In this lab you will deploy an Azure Kubernetes Service (AKS) cluster and other Azure services (Container Registry, Managed Identity, Key Vault) with Azure CLI and Bicep. You will then install Kubeflow using the default settings using Kustomize and create a jupyter notebook server you can easily access on your browser.
Instructions for Basic Deployment without TLS and with Default Password
This deployment option is for testing only. To deploy with TLS, and change default password, please click here: Deploy kubeflow with TLS.
Use the Azure CLI and Bicep templates to deploy the infrastructure for your application. We will be using the AKS construction project to rapidly deploy the required Azure resources. The project allows users the flexibility to tweak their AKS environment however they want. Please check out the AKS construction helper for more details about AKS construction.
Login to the Azure CLI.
az login
az account set --subscription <NAME_OR_ID_OF_SUBSCRIPTION>
.
Install kubectl using the Azure CLI, if required.
az aks install-cli
Clone this repo which includes the Azure/AKS-Construction and kubeflow/manifests repos as Git Submodules
git clone --recurse-submodules https://github.com/Azure/kubeflow-aks.git
Change directory into the newly cloned directory
cd kubeflow-aks
Deployment steps
User Access Admin
and Contributor
or Owner
access to the subscription you are deploying into.
Get the signed in user id so that you can get admin access to the cluster you create
SIGNEDINUSER=$(az ad signed-in-user show --query id --out tsv)
RGNAME=kubeflow
Create deployment
az group create -n $RGNAME -l eastus
DEP=$(az deployment group create -g $RGNAME --parameters signedinuser=$SIGNEDINUSER -f main.bicep -o json)
echo $DEP > test.json
and restore it by running export DEP=$(cat test.json)
.
KVNAME=$(echo $DEP | jq -r '.properties.outputs.kvAppName.value')
AKSCLUSTER=$(echo $DEP | jq -r '.properties.outputs.aksClusterName.value')
TENANTID=$(az account show --query tenantId -o tsv)
ACRNAME=$(az acr list -g $RGNAME --query "[0].name" -o tsv)
Install kubelogin and log into the cluster
Next install kubelogin using the installation instructions appropriate for your computer. From there, you’ll need to run the following commands to download the kubeconfig file and convert it for use with kubelogin.
az aks get-credentials --resource-group $RGNAME \
--name $AKSCLUSTER
kubelogin convert-kubeconfig -l azurecli
Log in to the cluster. Enter your Azure credentials when prompted afterwards to complete the login. If this is successful, kubectl should return a list of nodes.
kubectl get nodes
Install kustomize
Next install kustomize using the installation instructions appropriate for your computer.
kustomize
command below to deploy Kubeflow, you must use Kustomize v3.2.0. More info here.
Deploy Kubeflow without TLS using Default Password
This deployment option is for testing only. To deploy with TLS, and change default password, please click here: Deploy kubeflow with TLS.
From the root of the repo, cd
into kubeflow’s manifests
directory and make sure you are in the v1.7-branch
.
cd manifests/
git checkout v1.7-branch
cd ..
Install all of the components via a single command
cp -a deployments/vanilla manifests/vanilla
cd manifests/
while ! kustomize build vanilla | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done
Once the command has completed, check the pods are ready
kubectl get pods -n cert-manager
kubectl get pods -n istio-system
kubectl get pods -n auth
kubectl get pods -n knative-eventing
kubectl get pods -n knative-serving
kubectl get pods -n kubeflow
kubectl get pods -n kubeflow-user-example-com
Run kubectl port-forward
to access the Kubeflow dashboard
kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80
Finally, open http://localhost:8080 and login with the default user’s credentials. The default email address is user@example.com
and the default password is 12341234
Testing the deployment with a Notebook server
You can test that the deployments worked by creating a new Notebook server using the GUI.
-
Click on “Create a new Notebook server”
-
Click on “+ New Notebook” in the top right corner of the resulting page
-
Enter a name for the server
-
Leave the “jupyterlab” option selected
-
Feel free to pick one of the images available, in this case we choose the default
-
Set Requested CPU to 0.5 and requested memory in Gi to 1
-
Under Data Volumes click on “+ Add new volume”
-
Expand the resulting section
-
Set the name to datavol-1. The default name provided would not work because it has characters that are not allowed
-
Set the size in Gi to 1
-
Uncheck “Use default class”
-
Choose a class from the provided options. In this case I will choose “azurefile-premium”
-
Choose ReadWriteMany as the Access mode. Your data volume config should look like the picture below
-
Click on “Launch” at the bottom of the page. A successful deployment should have a green checkmark under status, after 1-2 minutes.
-
Click on “Connect” to access your jupyter lab
-
Under Notebook, click on Python 3 to access your jupyter notebook and start coding
Next steps
[Secure your kubeflow cluster using TLS and stronger Password] deployment option.