Deploy Kubeflow with Password, Ingress and TLS

Deploying Kubeflow on AKS with Custom Password and TLS

Background

In this lab you will deploy an Azure Kubernetes Service (AKS) cluster and other Azure services (Container Registry, Managed Identity, Key Vault) with Azure CLI and Bicep. You will then install Kubeflow after creating custom Password. This deployment option will also make use of TLS with a self-signed certificate and an ingress controller. Swap out this self signed certificate with your own CA certs for production workloads.

Deploy Kubeflow with Password, Ingress and TLS

Use the Azure CLI and Bicep templates to deploy the infrastructure for your application. We will be using the AKS construction project to rapidly deploy the required Azure resources. The project allows users the flexibility to tweak their AKS environment however they want. Please check out the AKS construction helper for more details about AKS construction.

You can also try out the automated option using Mage build tool at the Azure Open Source Labs.

Login to the Azure CLI.

az login

Install kubectl using the Azure CLI, if required.

az aks install-cli

Clone this repo which includes the Azure/AKS-Construction and kubeflow/manifests repos as Git Submodules

git clone --recurse-submodules https://github.com/Azure/kubeflow-aks.git

Change directory into the newly cloned directory

cd kubeflow-aks

Deployment steps

Get the signed in user id so that you can get admin access to the cluster you create

SIGNEDINUSER=$(az ad signed-in-user show --query id --out tsv)
RGNAME=kubeflow

Create deployment

az group create -n $RGNAME -l eastus
DEP=$(az deployment group create -g $RGNAME --parameters signedinuser=$SIGNEDINUSER -f main.bicep -o json)
KVNAME=$(echo $DEP | jq -r '.properties.outputs.kvAppName.value')
AKSCLUSTER=$(echo $DEP | jq -r '.properties.outputs.aksClusterName.value')
TENANTID=$(az account show --query tenantId -o tsv)
ACRNAME=$(az acr list -g $RGNAME --query "[0].name"  -o tsv)

Install kubelogin and log into the cluster

Next install kubelogin using the installation instructions appropriate for your computer. From there, you’ll need to run the following commands to download the kubeconfig file and convert it for use with kubelogin.

az aks get-credentials --resource-group $RGNAME \
  --name $AKSCLUSTER

kubelogin convert-kubeconfig -l azurecli

Log in to the cluster. Enter your Azure credentials when prompted afterwards to complete the login. If this is successful, kubectl should return a list of nodes.

kubectl get nodes

Install kustomize

Next install kustomize using the installation instructions appropriate for your computer.

Install Kubeflow with TLS and Unique Password

Please note that a self-signed certificate is used for demonstration purposes. Do not use self signed certs for production workloads. You can easily swap this self-signed cert with your CA certificate for your usecase.

  1. The first step is to generate a new Hash/Password combination using bycrypt. There are many ways of doing this, eg by generating it using python. For simplicity we will be using coderstool’s Bycrypt Hash Generator for testing purposes. Do not do this for production workloads. In the plain text field, enter a password for your first user, then click on the “Generate Hash” button. You can generate multiple if you have multiple users. Generate password

  2. Head to the deployments/tls/dex-config-map.yaml file and update the hash value there (around line 22) with the hash you just generated. You can also change the email address, username and userid. In addition, you can setup multiple users by adding more users to the array. Please update the default email address in the params file located at manifests\common\user-namespace\base\params.env file if changed from default.

  3. Update your auth.md file with the new email address and password (plain text password not hash) or store the secrets in a more secure way

  4. Copy the contents of this newly updated manifests folder to the kubeflow manifests folder. This will update the files so the deployment includes your config changes.

    cp -a deployments/tls manifests/tls
    
  5. cd to the manifests folder and install kubeflow

    cd manifests
    

    Install all of the components via a single command

    while ! kustomize build tls | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done
    
  6. Once the command has completed, check the pods are ready

    kubectl get pods -n cert-manager
    kubectl get pods -n istio-system
    kubectl get pods -n auth
    kubectl get pods -n knative-eventing
    kubectl get pods -n knative-serving
    kubectl get pods -n kubeflow
    kubectl get pods -n kubeflow-user-example-com
    
  7. Restart dex to ensure dex is using the updated password

    kubectl rollout restart deployment dex -n auth
    
  8. Configure TLS. Start by getting IP address of istio gateway

    kubectl -n istio-system get service istio-ingressgateway --output jsonpath={.status.loadBalancer.ingress[0].ip}
    

    Replace the IP address in the deployments/tls/certificate.yaml file (line 13) with the IP address of the istio gateway and save the file.

  9. Please note that instead of providing the IP address like we did above, you could give the LoadBalancer an Azure sub-domain (via the annotation in manifests/common/istio-1-16/istio-install/base/patches/service.yaml ) and use that too. Deploy the certificate manifest file.

    kubectl apply -f  tls-manifest/certificate.yaml 
    
  10. You have completed the deployment. Access the dashboard by entering the IP address in a browser. You might get a warning saying the connection is unsafe. This is expected since you are using a self signed certificate. Click on advanced and proceed to the URL to view your dashboard. Log in using the email address and password in the auth.md file (assuming you updated it with your email address and password in the previous step) Generate password

Testing the deployment with a Notebook server

You can test that the deployments worked by creating a new Notebook server using the GUI.

  1. Click on “Create a new Notebook server” creating a new Notebook server
  2. Click on “+ New Notebook” in the top right corner of the resulting page
  3. Enter a name for the server
  4. Leave the “jupyterlab” option selected
  5. Feel free to pick one of the images available, in this case we choose the default
  6. Set Requested CPU to 0.5 and requested memory in Gi to 1
  7. Under Data Volumes click on “+ Add new volume”
  8. Expand the resulting section
  9. Set the name to datavol-1. The default name provided would not work because it has characters that are not allowed
  10. Set the size in Gi to 1
  11. Uncheck “Use default class”
  12. Choose a class from the provided options. In this case I will choose “azurefile-premium”
  13. Choose ReadWriteMany as the Access mode. Your data volume config should look like the picture below data volume config
  14. Click on “Launch” at the bottom of the page. A successful deployment should have a green checkmark under status, after 1-2 minutes. deployment successful
  15. Click on “Connect” to access your jupyter lab
  16. Under Notebook, click on Python 3 to access your jupyter notebook and start coding

Destroy the resources

Run the command below to destroy the resources you just created after you are done testing

az group delete -n $RGNAME