This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Deployment Options

Deploy Kubeflow into AKS

Start by checking out the prerequisites page

If you want to deploy Kubeflow with minimal changes on AKS, then consider the vanilla deployment option. The Kubeflow control plane is installed on Azure Kubernetes Service (AKS), which is a managed container service used to run and scale Kubernetes applications in the cloud.

For a more secure deployment option that is has minimum baseline security, then consider the Deploy with TLS deployment option.

1 - Prerequisites

Set up your environment for deploying Kubeflow for AKS

Kubeflow on AKS Prerequisites

For all Kubeflow on AKS deployment options, you will need the following

  • An Azure Subscription (e.g. Free or Student account)
  • The Azure CLI
  • Bash shell (e.g. macOS, Linux, Windows Subsystem for Linux (WSL), Multipass, Azure Cloud Shell, GitHub Codespaces, devcontainers, etc). This repository comes with a .devcontainer folder that allows you to configure your Codespaces or devcontainers environment so that it has all the required Bash tools like kubelogin and the correct version of kustomize
  • The following installed in your Bash shell if you are not going with the codespaces or devcontainers option
    • Kustomize
      • Install Kustomize
      curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash
      sudo mv ./kustomize /usr/local/bin/kustomize
      
      Verify the installation:
      kustomize version
      
    • Kubelogin
      • to install both kubectl and kubelogin use Azure CLI:
        az aks install-cli
        
    • git
    • Bicep
    • Kubectl
    • sed (optional)

2 - Vanilla Installation

Deploy kubeflow into an AKS cluster using default settings.

Background

In this lab, you will use the Azure CLI to deploy an Azure Kubernetes Service (AKS) Automatic cluster. AKS Automatic offers a simplified, managed Kubernetes experience with automated node management, scaling, and security configurations. For more details, see the AKS Automatic documentation. Note that AKS Automatic is currently in preview, while it provides faster setup and less manual configuration, it is not recommended for production use. For production workloads or when advanced features and customization are required, use regular AKS instead. You will then install Kubeflow using the default settings using Kustomize and create a jupyter notebook server you can easily access on your browser.

You can follow these same instructions to deploy Kubeflow on a non-automatic AKS cluster.

Instructions for Basic Deployment without TLS and with Default Password

This deployment option is for testing only. To deploy with TLS, and change default password, please click here: Deploy kubeflow with TLS.

Deploy AKS Automatic

Use the Azure CLI to deploy an AKS Automatic cluster.

For detailed instructions on installing AKS Automatic, please refer to the AKS Automatic installation documentation.

Login to the Azure CLI.

az login

Set up your environment variables

RGNAME=kubeflow
CLUSTERNAME=kubeflow-aks-automatic
LOCATION=eastus

Create the resource group

az group create -n $RGNAME -l $LOCATION

Add or Update AKS extension

az extension add --name aks-preview

This article requires the aks-preview Azure CLI extension version 9.0.0b4 or later.

Create an AKS Automatic cluster

az aks create \
    --resource-group $RGNAME \
    --name $CLUSTERNAME \
    --location $LOCATION \
    --sku automatic \
    --generate-ssh-keys 

Connect to AKS Automatic Cluster

After the cluster is created, you can connect to it using the Azure CLI. The following command retrieves the credentials for your AKS cluster and configures kubectl to use them.

az aks get-credentials --resource-group $RGNAME --name $CLUSTERNAME

Verify connectivity to the cluster. This should return a list of nodes.

kubectl get nodes

Deploy KubeFlow

Clone this repo which includes the kubeflow/manifests repo as Git Submodules

git clone --recurse-submodules https://github.com/Azure/kubeflow-aks.git

Change directory into the newly cloned directory

cd kubeflow-aks

Run Kubeflow Kustomize deployment

This deployment option is for testing only. To deploy with TLS, and change default password, please click here: Deploy kubeflow with TLS.

From the root of the repo, cd into kubeflow’s manifests directory and make sure you are in the v1.10-branch.

cd manifests/
git checkout v1.10-branch
cd ..

Install all of the components via a single command

cp -a deployments/vanilla manifests/vanilla
cd manifests/  
while ! kustomize build vanilla | kubectl apply --server-side=true -f -; do echo "Retrying to apply resources"; sleep 10; done

Once the command has completed, check the pods are ready

kubectl get pods -n cert-manager
kubectl get pods -n istio-system
kubectl get pods -n auth
kubectl get pods -n knative-eventing
kubectl get pods -n knative-serving
kubectl get pods -n kubeflow
kubectl get pods -n kubeflow-user-example-com

Access the Kubeflow dashboard

Run kubectl port-forward to access the Kubeflow dashboard

kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80

Finally, open http://localhost:8080 and login with the default user’s credentials. The default email address is user@example.com and the default password is 12341234

Testing the deployment with a Notebook server

You can test that the deployments worked by creating a new Notebook server using the GUI.

  1. Click on “Create a new Notebook” creating a new Notebook server

  2. Click on “+ New Notebook” in the top right corner of the resulting page

  3. Enter a name for the server

  4. Leave the “jupyterlab” option selected

  5. Feel free to pick one of the images available, in this case we choose the default

  6. Set Requested CPU to 0.5 and requested memory in Gi to 1

  7. Under Data Volumes click on “+ Add new volume”

  8. Expand the resulting section

  9. Set the name to datavol-1. The default name provided would not work because it has characters that are not allowed

  10. Set the size in Gi to 1

  11. Uncheck “Use default class”

  12. Choose a class from the provided options. In this case I will choose azurefile-premium

  13. Choose ReadWriteMany as the Access mode. Your data volume config should look like the picture below data volume config

  14. Click on “Launch” at the bottom of the page. A successful deployment should have a green checkmark under status, after 1-2 minutes. deployment successful

  15. Click on “Connect” to access your jupyter lab

  16. Under Notebook, click on Python 3 to access your jupyter notebook and start coding

Next steps

To connect to Kubeflow applications you need to set up HTTPS. The reason is that many of our web applications (e.g., Tensorboard Web Application, Jupyter Web Application, Katib UI) use Secure Cookies, so accessing Kubeflow with HTTP over a non-localhost domain does not work.

Deploy with TLS deployment option.

3 - Authenticate Kubeflow users with Custom Password or Entra Id

Authenticating Kubeflow users on AKS with Custom Password or Entra Id

Background

In this lab, you will update the Kubeflow vanilla installation option to configure authentication using either custom users and passwords or Azure Entra ID.

Change default password

To change the default password for the Kubeflow dashboard, you need to update the Dex configuration.

  1. First generate Password/Hashes by following steps described in kubeflow docs using python to generate bcrypt hash. Or for simplicity you can use an online tool like bcrypt-generator to create a new hash.
pip3 install passlib
python3 -c 'from passlib.hash import bcrypt; import getpass; print(bcrypt.using(rounds=12, ident="2y").hash(getpass.getpass()))'

Password: ***
$2y$12$XXXXXXXXXXXXXXXXXXX
  1. Delete existing password
kubectl delete secret dex-passwords -n auth
  1. Create new password secret
kubectl create secret generic dex-passwords --from-literal=DEX_USER_PASSWORD='REPLACE_WITH_HASH' -n auth
  1. Restart the Dex deployment to pick up the new password secret:
kubectl rollout restart deployment dex -n auth

To add more users

  1. update dex config map deployments/vanilla/dex-config-map.yaml with more entries in user array:
    staticPasswords:
    - email: user@example.com
      hashFromEnv: DEX_USER_PASSWORD
      username: user
      userID: "15841185641784"
      # Add more users here
    - email: user2@example.com
        hashFromEnv: DEX_USER2_PASSWORD
        username: user2
        userID: "15841185641785"
  1. Update DEX_USER2_PASSWORD with the new password hash.
kubectl patch secret dex-passwords -n auth --type='json' -p='[{"op": "replace", "path": "/data/DEX_USER2_PASSWORD", "value":"'$(echo -n 'REPLACE_WITH_HASH' | base64)'"}]'
  1. Apply config map and restart deployment
kubectl apply -f deployments/vanilla/dex-config-map.yaml
kubectl rollout restart deployment dex -n auth

Note: if need to update the default email address, change the params file located at manifests\common\user-namespace\base\params.env before installing Kubeflow.

Entra ID Configuration

4 - Deploy Kubeflow with Password, Ingress and TLS

Deploying Kubeflow on AKS with Custom Password and TLS

Background

In this lab, you will use the Azure CLI to deploy an Azure Kubernetes Service (AKS) Automatic cluster. AKS Automatic offers a simplified, managed Kubernetes experience with automated node management, scaling, and security configurations. For more details, see the AKS Automatic documentation. Note that AKS Automatic is currently in preview, while it provides faster setup and less manual configuration, it is not recommended for production use. For production workloads or when advanced features and customization are required, use regular AKS instead. You will then install Kubeflow with a custom password and TLS configuration. This deployment option uses a self-signed certificate and an ingress controller. Replace the self-signed certificate with your own CA certs for production workloads.

You can follow these same instructions to deploy Kubeflow on a non-automatic AKS cluster.

DeployAKS Automatic

Deploy AKS Automatic

Deploy AKS Automatic

Use the Azure CLI to deploy an AKS Automatic cluster.

For detailed instructions on installing AKS Automatic, please refer to the AKS Automatic installation documentation.

Login to the Azure CLI.

az login

Set up your environment variables

RGNAME=kubeflow
CLUSTERNAME=kubeflow-aks-automatic
LOCATION=eastus

Create the resource group

az group create -n $RGNAME -l $LOCATION

Add or Update AKS extension

az extension add --name aks-preview

This article requires the aks-preview Azure CLI extension version 9.0.0b4 or later.

Create an AKS Automatic cluster

az aks create \
    --resource-group $RGNAME \
    --name $CLUSTERNAME \
    --location $LOCATION \
    --sku automatic \
    --generate-ssh-keys 

Connect to AKS Automatic Cluster

After the cluster is created, you can connect to it using the Azure CLI. The following command retrieves the credentials for your AKS cluster and configures kubectl to use them.

az aks get-credentials --resource-group $RGNAME --name $CLUSTERNAME

Verify connectivity to the cluster. This should return a list of nodes.

kubectl get nodes

Deploy Kubeflow with Password, Ingress and TLS

Clone this repo which includes the kubeflow/manifests repo as Git Submodules

git clone --recurse-submodules https://github.com/Azure/kubeflow-aks.git

Change directory into the newly cloned directory

cd kubeflow-aks

From the root of the repo, ensure you’re using the v1.10-branch:

cd manifests/
git checkout v1.10-branch
cd ..
  • Copy the TLS deployment files:
cp -a deployments/tls manifests/tls

Configure Custom password

In the next steps generate password hash for your custom password and replace it in the dex-passwords.yaml file.

First generate Password/Hash by following steps described in kubeflow docs using python to generate bcrypt hash. Or for simplicity you can use an online tool like bcrypt-generator to create a new hash.

PASSWORD="your_custom_password"
PASSWORD_HASH=$(python3 -c "import bcrypt; print(bcrypt.hashpw(b'$PASSWORD', bcrypt.gensalt()).decode())")

Update the password hash in the manifests/tls/dex-passwords.yaml secret:

sed -i "s|<YOUR_DEX_USER_PASSWORD>|$PASSWORD_HASH|g" manifests/tls/dex-passwords.yaml

Install Kubeflow

  • Deploy Kubeflow
cd manifests/  
while ! kustomize build tls | kubectl apply --server-side=true -f -; do echo "Retrying to apply resources"; sleep 10; done
  • Once the command has completed, check the pods are ready
kubectl get pods -n cert-manager
kubectl get pods -n istio-system
kubectl get pods -n auth
kubectl get pods -n knative-eventing
kubectl get pods -n knative-serving
kubectl get pods -n kubeflow
kubectl get pods -n kubeflow-user-example-com

Expose the Kubeflow dashboard using Ingress with TLS

There are couple options to expose your Kubeflow cluster with proper HTTPS using Ingress. See note in Kubeflow docs NodePort / LoadBalancer / Ingress In this example we will use the nginx ingress controller which is included as part of the app-routing-system addon in AKS Automatic.

Step 1: Create TLS Certificate

We can create a self-signed certificate for the Kubeflow with IP available on Nginx ingress LoadBalancer or assign DNS Label

Step 1: Find IP or DNS Label of Nginx ingress

  • Obtain Nginx IP
NGINX_IP=$(kubectl get svc -n app-routing-system -o jsonpath='{.items[?(@.spec.type=="LoadBalancer")].status.loadBalancer.ingress[0].ip}')
echo "Nginx IP: $NGINX_IP"
  • Optional: Use Azure DNS for a friendly URL You can also configure a custom domain name assigned to the Nginx ingress service using Azure DNS:
kubectl annotate service nginx -n app-routing-system \
  service.beta.kubernetes.io/azure-dns-label-name=my-kubeflow-cluster

This will make Kubeflow accessible at: my-kubeflow-cluster.$LOCATION.cloudapp.azure.com

Step 2: Create TLS Certificate

If using IP address create following certificate:

echo "apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: kubeflow-tls-cert
  namespace: app-routing-system
spec:
  secretName: kubeflow-tls-secret
  ipAddresses:
    - $NGINX_IP
  isCA: false
  issuerRef:
    name: kubeflow-self-signing-issuer
    kind: ClusterIssuer
    group: cert-manager.io" | kubectl apply -f -

If using DNS label use following definition (replace my-kubeflow-cluster with your unique dns label)

echo "apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: kubeflow-tls-cert
  namespace: app-routing-system
spec:
  secretName: kubeflow-tls-secret
  dnsNames:
    - my-kubeflow-cluster.$LOCATION.cloudapp.azure.com
  isCA: false
  issuerRef:
    name: kubeflow-self-signing-issuer
    kind: ClusterIssuer
    group: cert-manager.io" | kubectl apply -f -
  • Deploy the certificate:
kubectl apply -f tls/certificate.yaml

Wait for the certificate to be ready:

kubectl wait --for=condition=Ready certificate/kubeflow-tls-cert -n istio-system --timeout=300s

Step 3: Configure Ingress

Create and apply an ingress manifest to expose the Kubeflow components:

echo 'apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: kubeflow-ingress
  namespace: istio-system
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "false"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
    nginx.ingress.kubernetes.io/ssl-protocols: "TLSv1.2 TLSv1.3"
spec:
  ingressClassName: webapprouting.kubernetes.azure.com
  tls:
  - secretName: kubeflow-tls-secret
  rules:
  - http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: istio-ingressgateway
            port:
              number: 80' | kubectl apply -f -

Verify Ingress:

kubectl get ingress kubeflow-ingress -n istio-system

Wait for the ADDRESS field to show an external IP address (this may take a few minutes).

NAME               CLASS                                HOSTS   ADDRESS       PORTS   AGE
kubeflow-ingress   webapprouting.kubernetes.azure.com   *       xxx.149.0.222   443      16m

Access Kubeflow Dashboard

You can now access the Kubeflow dashboard at https://$NGINX_IP or https://my-kubeflow-cluster.$LOCATION.cloudapp.azure.com if DNS was configured.

Log in using:

  • Email: user@example.com (or the email you configured)
  • Password: The password you used to generate the hash

Testing the deployment with a Notebook server

You can test that the deployments worked by creating a new Notebook server using the GUI.

  1. Click on “Create a new Notebook” on the Kubeflow dashboard creating a new Notebook
  2. Click on “+ New Notebook” in the top right corner of the resulting page
  3. Enter a name for the server
  4. Leave the “jupyterlab” option selected
  5. Feel free to pick one of the images available, in this case we choose the default
  6. Set Requested CPU to 0.5 and requested memory in Gi to 1
  7. Under Data Volumes click on “+ Add new volume”
  8. Expand the resulting section
  9. Set the name to datavol-1. The default name provided would not work because it has characters that are not allowed
  10. Set the size in Gi to 1
  11. Uncheck “Use default class”
  12. Choose a class from the provided options. In this case I will choose “azurefile-premium”
  13. Choose ReadWriteMany as the Access mode. Your data volume config should look like the picture below data volume config
  14. Click on “Launch” at the bottom of the page. A successful deployment should have a green checkmark under status, after 1-2 minutes. deployment successful
  15. Click on “Connect” to access your jupyter lab
  16. Under Notebook, click on Python 3 to access your jupyter notebook and start coding

Destroy the resources

Run the command below to destroy the resources you just created after you are done testing

az group delete -n $RGNAME