1 - Overview

Overview of Kubeflow on Azure Kubernetes Service

Azure Kubernetes Service(AKS) is a managed Kubernetes platform on Azure. It provides various features that makes it easy to get up and running on production grade Kubernetes Clusters. For more information about AKS, check out Introduction to Azure Kubernetes Service.

This project provides various deployment options for running and testing Kubeflow running on AKS. To get started, check out our Deployment Options page.

2 - Contribution Guidelines

How to contribute to the docs

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.

3 - Deployment Options

Deploy Kubeflow into AKS

Start by checking out the prerequisites page

If you want to deploy Kubeflow with minimal changes on AKS, then consider the vanilla deployment option. The Kubeflow control plane is installed on Azure Kubernetes Service (AKS), which is a managed container service used to run and scale Kubernetes applications in the cloud.

For a more secure deployment option that is has minimum baseline security, then consider the deploy with custom password and TLS deployment option.

3.1 - Deploy Kubeflow with Password, Ingress and TLS

Deploying Kubeflow on AKS with Custom Password and TLS

Background

In this lab you will deploy an Azure Kubernetes Service (AKS) cluster and other Azure services (Container Registry, Managed Identity, Key Vault) with Azure CLI and Bicep. You will then install Kubeflow after creating custom Password. This deployment option will also make use of TLS with a self-signed certificate and an ingress controller. Swap out this self signed certificate with your own CA certs for production workloads.

Deploy Kubeflow with Password, Ingress and TLS

Use the Azure CLI and Bicep templates to deploy the infrastructure for your application. We will be using the AKS construction project to rapidly deploy the required Azure resources. The project allows users the flexibility to tweak their AKS environment however they want. Please check out the AKS construction helper for more details about AKS construction.

You can also try out the automated option using Mage build tool at the Azure Open Source Labs.

Login to the Azure CLI.

az login

Install kubectl using the Azure CLI, if required.

az aks install-cli

Clone this repo which includes the Azure/AKS-Construction and kubeflow/manifests repos as Git Submodules

git clone --recurse-submodules https://github.com/Azure/kubeflow-aks.git

Change directory into the newly cloned directory

cd kubeflow-aks

Deployment steps

Get the signed in user id so that you can get admin access to the cluster you create

SIGNEDINUSER=$(az ad signed-in-user show --query id --out tsv)
RGNAME=kubeflow

Create deployment

az group create -n $RGNAME -l eastus
DEP=$(az deployment group create -g $RGNAME --parameters signedinuser=$SIGNEDINUSER -f main.bicep -o json)
KVNAME=$(echo $DEP | jq -r '.properties.outputs.kvAppName.value')
AKSCLUSTER=$(echo $DEP | jq -r '.properties.outputs.aksClusterName.value')
TENANTID=$(az account show --query tenantId -o tsv)
ACRNAME=$(az acr list -g $RGNAME --query "[0].name"  -o tsv)

Install kubelogin and log into the cluster

Next install kubelogin using the installation instructions appropriate for your computer. From there, you’ll need to run the following commands to download the kubeconfig file and convert it for use with kubelogin.

az aks get-credentials --resource-group $RGNAME \
  --name $AKSCLUSTER

kubelogin convert-kubeconfig -l azurecli

Log in to the cluster. Enter your Azure credentials when prompted afterwards to complete the login. If this is successful, kubectl should return a list of nodes.

kubectl get nodes

Install kustomize

Next install kustomize using the installation instructions appropriate for your computer.

Install Kubeflow with TLS and Unique Password

Please note that a self-signed certificate is used for demonstration purposes. Do not use self signed certs for production workloads. You can easily swap this self-signed cert with your CA certificate for your usecase.

  1. The first step is to generate a new Hash/Password combination using bycrypt. There are many ways of doing this, eg by generating it using python. For simplicity we will be using coderstool’s Bycrypt Hash Generator for testing purposes. Do not do this for production workloads. In the plain text field, enter a password for your first user, then click on the “Generate Hash” button. You can generate multiple if you have multiple users. Generate password

  2. Head to the deployments/tls/dex-config-map.yaml file and update the hash value there (around line 22) with the hash you just generated. You can also change the email address, username and userid. In addition, you can setup multiple users by adding more users to the array. Please update the default email address in the params file located at manifests\common\user-namespace\base\params.env file if changed from default.

  3. Update your auth.md file with the new email address and password (plain text password not hash) or store the secrets in a more secure way

  4. Copy the contents of this newly updated manifests folder to the kubeflow manifests folder. This will update the files so the deployment includes your config changes.

    cp -a deployments/tls manifests/tls
    
  5. cd to the manifests folder and install kubeflow

    cd manifests
    

    Install all of the components via a single command

    while ! kustomize build tls | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done
    
  6. Once the command has completed, check the pods are ready

    kubectl get pods -n cert-manager
    kubectl get pods -n istio-system
    kubectl get pods -n auth
    kubectl get pods -n knative-eventing
    kubectl get pods -n knative-serving
    kubectl get pods -n kubeflow
    kubectl get pods -n kubeflow-user-example-com
    
  7. Restart dex to ensure dex is using the updated password

    kubectl rollout restart deployment dex -n auth
    
  8. Configure TLS. Start by getting IP address of istio gateway

    kubectl -n istio-system get service istio-ingressgateway --output jsonpath={.status.loadBalancer.ingress[0].ip}
    

    Replace the IP address in the deployments/tls/certificate.yaml file (line 13) with the IP address of the istio gateway and save the file.

  9. Please note that instead of providing the IP address like we did above, you could give the LoadBalancer an Azure sub-domain (via the annotation in manifests/common/istio-1-16/istio-install/base/patches/service.yaml ) and use that too. Deploy the certificate manifest file.

    kubectl apply -f  tls-manifest/certificate.yaml 
    
  10. You have completed the deployment. Access the dashboard by entering the IP address in a browser. You might get a warning saying the connection is unsafe. This is expected since you are using a self signed certificate. Click on advanced and proceed to the URL to view your dashboard. Log in using the email address and password in the auth.md file (assuming you updated it with your email address and password in the previous step) Generate password

Testing the deployment with a Notebook server

You can test that the deployments worked by creating a new Notebook server using the GUI.

  1. Click on “Create a new Notebook server” creating a new Notebook server
  2. Click on “+ New Notebook” in the top right corner of the resulting page
  3. Enter a name for the server
  4. Leave the “jupyterlab” option selected
  5. Feel free to pick one of the images available, in this case we choose the default
  6. Set Requested CPU to 0.5 and requested memory in Gi to 1
  7. Under Data Volumes click on “+ Add new volume”
  8. Expand the resulting section
  9. Set the name to datavol-1. The default name provided would not work because it has characters that are not allowed
  10. Set the size in Gi to 1
  11. Uncheck “Use default class”
  12. Choose a class from the provided options. In this case I will choose “azurefile-premium”
  13. Choose ReadWriteMany as the Access mode. Your data volume config should look like the picture below data volume config
  14. Click on “Launch” at the bottom of the page. A successful deployment should have a green checkmark under status, after 1-2 minutes. deployment successful
  15. Click on “Connect” to access your jupyter lab
  16. Under Notebook, click on Python 3 to access your jupyter notebook and start coding

Destroy the resources

Run the command below to destroy the resources you just created after you are done testing

az group delete -n $RGNAME

3.2 - Prerequisties

Set up your environment for deploying Kubeflow for AKS

Kubeflow on AKS Prerequisites

For all Kubeflow on AKS deployment options, you will need the following

3.3 - Vanilla Installation

Deploy kubeflow into an AKS cluster using default settings.

Background

In this lab you will deploy an Azure Kubernetes Service (AKS) cluster and other Azure services (Container Registry, Managed Identity, Key Vault) with Azure CLI and Bicep. You will then install Kubeflow using the default settings using Kustomize and create a jupyter notebook server you can easily access on your browser.

Instructions for Basic Deployment without TLS and with Default Password

This deployment option is for testing only. To deploy with TLS, and change default password, please click here: Deploy kubeflow with TLS.

Use the Azure CLI and Bicep templates to deploy the infrastructure for your application. We will be using the AKS construction project to rapidly deploy the required Azure resources. The project allows users the flexibility to tweak their AKS environment however they want. Please check out the AKS construction helper for more details about AKS construction.

Login to the Azure CLI.

az login

Install kubectl using the Azure CLI, if required.

az aks install-cli

Clone this repo which includes the Azure/AKS-Construction and kubeflow/manifests repos as Git Submodules

git clone --recurse-submodules https://github.com/Azure/kubeflow-aks.git

Change directory into the newly cloned directory

cd kubeflow-aks

Deployment steps

Get the signed in user id so that you can get admin access to the cluster you create

SIGNEDINUSER=$(az ad signed-in-user show --query id --out tsv)
RGNAME=kubeflow

Create deployment

az group create -n $RGNAME -l eastus
DEP=$(az deployment group create -g $RGNAME --parameters signedinuser=$SIGNEDINUSER -f main.bicep -o json)
KVNAME=$(echo $DEP | jq -r '.properties.outputs.kvAppName.value')
AKSCLUSTER=$(echo $DEP | jq -r '.properties.outputs.aksClusterName.value')
TENANTID=$(az account show --query tenantId -o tsv)
ACRNAME=$(az acr list -g $RGNAME --query "[0].name"  -o tsv)

Install kubelogin and log into the cluster

Next install kubelogin using the installation instructions appropriate for your computer. From there, you’ll need to run the following commands to download the kubeconfig file and convert it for use with kubelogin.

az aks get-credentials --resource-group $RGNAME \
  --name $AKSCLUSTER

kubelogin convert-kubeconfig -l azurecli

Log in to the cluster. Enter your Azure credentials when prompted afterwards to complete the login. If this is successful, kubectl should return a list of nodes.

kubectl get nodes

Install kustomize

Next install kustomize using the installation instructions appropriate for your computer.

Deploy Kubeflow without TLS using Default Password

This deployment option is for testing only. To deploy with TLS, and change default password, please click here: Deploy kubeflow with TLS.

From the root of the repo, cd into kubeflow’s manifests directory and make sure you are in the v1.7-branch.

cd manifests/
git checkout v1.7-branch
cd ..

Install all of the components via a single command

cp -a deployments/vanilla manifests/vanilla
cd manifests/  
while ! kustomize build vanilla | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done

Once the command has completed, check the pods are ready

kubectl get pods -n cert-manager
kubectl get pods -n istio-system
kubectl get pods -n auth
kubectl get pods -n knative-eventing
kubectl get pods -n knative-serving
kubectl get pods -n kubeflow
kubectl get pods -n kubeflow-user-example-com

Run kubectl port-forward to access the Kubeflow dashboard

kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80

Finally, open http://localhost:8080 and login with the default user’s credentials. The default email address is user@example.com and the default password is 12341234

Testing the deployment with a Notebook server

You can test that the deployments worked by creating a new Notebook server using the GUI.

  1. Click on “Create a new Notebook server” creating a new Notebook server

  2. Click on “+ New Notebook” in the top right corner of the resulting page

  3. Enter a name for the server

  4. Leave the “jupyterlab” option selected

  5. Feel free to pick one of the images available, in this case we choose the default

  6. Set Requested CPU to 0.5 and requested memory in Gi to 1

  7. Under Data Volumes click on “+ Add new volume”

  8. Expand the resulting section

  9. Set the name to datavol-1. The default name provided would not work because it has characters that are not allowed

  10. Set the size in Gi to 1

  11. Uncheck “Use default class”

  12. Choose a class from the provided options. In this case I will choose “azurefile-premium”

  13. Choose ReadWriteMany as the Access mode. Your data volume config should look like the picture below data volume config

  14. Click on “Launch” at the bottom of the page. A successful deployment should have a green checkmark under status, after 1-2 minutes. deployment successful

  15. Click on “Connect” to access your jupyter lab

  16. Under Notebook, click on Python 3 to access your jupyter notebook and start coding

Next steps

[Secure your kubeflow cluster using TLS and stronger Password] deployment option.