Deploy Kubeflow with Password, Ingress and TLS

Deploying Kubeflow on AKS with Custom Password and TLS

Background

In this lab, you will use the Azure CLI to deploy an Azure Kubernetes Service (AKS) Automatic cluster. AKS Automatic offers a simplified, managed Kubernetes experience with automated node management, scaling, and security configurations. For more details, see the AKS Automatic documentation. Note that AKS Automatic is currently in preview, while it provides faster setup and less manual configuration, it is not recommended for production use. For production workloads or when advanced features and customization are required, use regular AKS instead. You will then install Kubeflow with a custom password and TLS configuration. This deployment option uses a self-signed certificate and an ingress controller. Replace the self-signed certificate with your own CA certs for production workloads.

You can follow these same instructions to deploy Kubeflow on a non-automatic AKS cluster.

DeployAKS Automatic

Deploy AKS Automatic

Deploy AKS Automatic

Use the Azure CLI to deploy an AKS Automatic cluster.

For detailed instructions on installing AKS Automatic, please refer to the AKS Automatic installation documentation.

Login to the Azure CLI.

az login

Set up your environment variables

RGNAME=kubeflow
CLUSTERNAME=kubeflow-aks-automatic
LOCATION=eastus

Create the resource group

az group create -n $RGNAME -l $LOCATION

Add or Update AKS extension

az extension add --name aks-preview

This article requires the aks-preview Azure CLI extension version 9.0.0b4 or later.

Create an AKS Automatic cluster

az aks create \
    --resource-group $RGNAME \
    --name $CLUSTERNAME \
    --location $LOCATION \
    --sku automatic \
    --generate-ssh-keys 

Connect to AKS Automatic Cluster

After the cluster is created, you can connect to it using the Azure CLI. The following command retrieves the credentials for your AKS cluster and configures kubectl to use them.

az aks get-credentials --resource-group $RGNAME --name $CLUSTERNAME

Verify connectivity to the cluster. This should return a list of nodes.

kubectl get nodes

Deploy Kubeflow with Password, Ingress and TLS

Clone this repo which includes the kubeflow/manifests repo as Git Submodules

git clone --recurse-submodules https://github.com/Azure/kubeflow-aks.git

Change directory into the newly cloned directory

cd kubeflow-aks

From the root of the repo, ensure you’re using the v1.10-branch:

cd manifests/
git checkout v1.10-branch
cd ..
  • Copy the TLS deployment files:
cp -a deployments/tls manifests/tls

Configure Custom password

In the next steps generate password hash for your custom password and replace it in the dex-passwords.yaml file.

First generate Password/Hash by following steps described in kubeflow docs using python to generate bcrypt hash. Or for simplicity you can use an online tool like bcrypt-generator to create a new hash.

PASSWORD="your_custom_password"
PASSWORD_HASH=$(python3 -c "import bcrypt; print(bcrypt.hashpw(b'$PASSWORD', bcrypt.gensalt()).decode())")

Update the password hash in the manifests/tls/dex-passwords.yaml secret:

sed -i "s|<YOUR_DEX_USER_PASSWORD>|$PASSWORD_HASH|g" manifests/tls/dex-passwords.yaml

Install Kubeflow

  • Deploy Kubeflow
cd manifests/  
while ! kustomize build tls | kubectl apply --server-side=true -f -; do echo "Retrying to apply resources"; sleep 10; done
  • Once the command has completed, check the pods are ready
kubectl get pods -n cert-manager
kubectl get pods -n istio-system
kubectl get pods -n auth
kubectl get pods -n knative-eventing
kubectl get pods -n knative-serving
kubectl get pods -n kubeflow
kubectl get pods -n kubeflow-user-example-com

Expose the Kubeflow dashboard using Ingress with TLS

There are couple options to expose your Kubeflow cluster with proper HTTPS using Ingress. See note in Kubeflow docs NodePort / LoadBalancer / Ingress In this example we will use the nginx ingress controller which is included as part of the app-routing-system addon in AKS Automatic.

Step 1: Create TLS Certificate

We can create a self-signed certificate for the Kubeflow with IP available on Nginx ingress LoadBalancer or assign DNS Label

Step 1: Find IP or DNS Label of Nginx ingress

  • Obtain Nginx IP
NGINX_IP=$(kubectl get svc -n app-routing-system -o jsonpath='{.items[?(@.spec.type=="LoadBalancer")].status.loadBalancer.ingress[0].ip}')
echo "Nginx IP: $NGINX_IP"
  • Optional: Use Azure DNS for a friendly URL You can also configure a custom domain name assigned to the Nginx ingress service using Azure DNS:
kubectl annotate service nginx -n app-routing-system \
  service.beta.kubernetes.io/azure-dns-label-name=my-kubeflow-cluster

This will make Kubeflow accessible at: my-kubeflow-cluster.$LOCATION.cloudapp.azure.com

Step 2: Create TLS Certificate

If using IP address create following certificate:

echo "apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: kubeflow-tls-cert
  namespace: app-routing-system
spec:
  secretName: kubeflow-tls-secret
  ipAddresses:
    - $NGINX_IP
  isCA: false
  issuerRef:
    name: kubeflow-self-signing-issuer
    kind: ClusterIssuer
    group: cert-manager.io" | kubectl apply -f -

If using DNS label use following definition (replace my-kubeflow-cluster with your unique dns label)

echo "apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: kubeflow-tls-cert
  namespace: app-routing-system
spec:
  secretName: kubeflow-tls-secret
  dnsNames:
    - my-kubeflow-cluster.$LOCATION.cloudapp.azure.com
  isCA: false
  issuerRef:
    name: kubeflow-self-signing-issuer
    kind: ClusterIssuer
    group: cert-manager.io" | kubectl apply -f -
  • Deploy the certificate:
kubectl apply -f tls/certificate.yaml

Wait for the certificate to be ready:

kubectl wait --for=condition=Ready certificate/kubeflow-tls-cert -n istio-system --timeout=300s

Step 3: Configure Ingress

Create and apply an ingress manifest to expose the Kubeflow components:

echo 'apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: kubeflow-ingress
  namespace: istio-system
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "false"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
    nginx.ingress.kubernetes.io/ssl-protocols: "TLSv1.2 TLSv1.3"
spec:
  ingressClassName: webapprouting.kubernetes.azure.com
  tls:
  - secretName: kubeflow-tls-secret
  rules:
  - http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: istio-ingressgateway
            port:
              number: 80' | kubectl apply -f -

Verify Ingress:

kubectl get ingress kubeflow-ingress -n istio-system

Wait for the ADDRESS field to show an external IP address (this may take a few minutes).

NAME               CLASS                                HOSTS   ADDRESS       PORTS   AGE
kubeflow-ingress   webapprouting.kubernetes.azure.com   *       xxx.149.0.222   443      16m

Access Kubeflow Dashboard

You can now access the Kubeflow dashboard at https://$NGINX_IP or https://my-kubeflow-cluster.$LOCATION.cloudapp.azure.com if DNS was configured.

Log in using:

  • Email: user@example.com (or the email you configured)
  • Password: The password you used to generate the hash

Testing the deployment with a Notebook server

You can test that the deployments worked by creating a new Notebook server using the GUI.

  1. Click on “Create a new Notebook” on the Kubeflow dashboard creating a new Notebook
  2. Click on “+ New Notebook” in the top right corner of the resulting page
  3. Enter a name for the server
  4. Leave the “jupyterlab” option selected
  5. Feel free to pick one of the images available, in this case we choose the default
  6. Set Requested CPU to 0.5 and requested memory in Gi to 1
  7. Under Data Volumes click on “+ Add new volume”
  8. Expand the resulting section
  9. Set the name to datavol-1. The default name provided would not work because it has characters that are not allowed
  10. Set the size in Gi to 1
  11. Uncheck “Use default class”
  12. Choose a class from the provided options. In this case I will choose “azurefile-premium”
  13. Choose ReadWriteMany as the Access mode. Your data volume config should look like the picture below data volume config
  14. Click on “Launch” at the bottom of the page. A successful deployment should have a green checkmark under status, after 1-2 minutes. deployment successful
  15. Click on “Connect” to access your jupyter lab
  16. Under Notebook, click on Python 3 to access your jupyter notebook and start coding

Destroy the resources

Run the command below to destroy the resources you just created after you are done testing

az group delete -n $RGNAME