Deploy Kubeflow with Password, Ingress and TLS
Categories:
Background
In this lab, you will use the Azure CLI to deploy an Azure Kubernetes Service (AKS) Automatic cluster. AKS Automatic offers a simplified, managed Kubernetes experience with automated node management, scaling, and security configurations. For more details, see the AKS Automatic documentation. Note that AKS Automatic is currently in preview, while it provides faster setup and less manual configuration, it is not recommended for production use. For production workloads or when advanced features and customization are required, use regular AKS instead. You will then install Kubeflow with a custom password and TLS configuration. This deployment option uses a self-signed certificate and an ingress controller. Replace the self-signed certificate with your own CA certs for production workloads.
You can follow these same instructions to deploy Kubeflow on a non-automatic AKS cluster.
DeployAKS Automatic
Deploy AKS Automatic
Deploy AKS Automatic
Use the Azure CLI to deploy an AKS Automatic cluster.
💡Note: In order to complete this deployment, you will need to have either following permissions on Resource Group:
- Microsoft.Authorization/policyAssignments/write
- Microsoft.Authorization/policyAssignments/read.
For detailed instructions on installing AKS Automatic, please refer to the AKS Automatic installation documentation.
Login to the Azure CLI.
az login
az account set --subscription <NAME_OR_ID_OF_SUBSCRIPTION>
.
Set up your environment variables
RGNAME=kubeflow
CLUSTERNAME=kubeflow-aks-automatic
LOCATION=eastus
Create the resource group
az group create -n $RGNAME -l $LOCATION
Add or Update AKS extension
az extension add --name aks-preview
This article requires the aks-preview
Azure CLI extension version 9.0.0b4 or later.
Create an AKS Automatic cluster
az aks create \
--resource-group $RGNAME \
--name $CLUSTERNAME \
--location $LOCATION \
--sku automatic \
--generate-ssh-keys
💡Note: AKS Automatic is in Preview and requires feature to be registered in subscription.
az feature register --namespace Microsoft.ContainerService --name AutomaticSKUPreview
Connect to AKS Automatic Cluster
After the cluster is created, you can connect to it using the Azure CLI. The following command retrieves the credentials for your AKS cluster and configures kubectl
to use them.
az aks get-credentials --resource-group $RGNAME --name $CLUSTERNAME
Verify connectivity to the cluster. This should return a list of nodes.
kubectl get nodes
Deploy Kubeflow with Password, Ingress and TLS
Clone this repo which includes the kubeflow/manifests repo as Git Submodules
git clone --recurse-submodules https://github.com/Azure/kubeflow-aks.git
--recurse-submodules
flag helps to get manifests from git submodule linked to this repo
Change directory into the newly cloned directory
cd kubeflow-aks
From the root of the repo, ensure you’re using the v1.10-branch:
cd manifests/
git checkout v1.10-branch
cd ..
- Copy the TLS deployment files:
cp -a deployments/tls manifests/tls
Configure Custom password
In the next steps generate password hash for your custom password and replace it in the dex-passwords.yaml file.
First generate Password/Hash by following steps described in kubeflow
docs using python to generate bcrypt hash. Or for simplicity you can use an online tool like bcrypt-generator to create a new hash.
PASSWORD="your_custom_password"
PASSWORD_HASH=$(python3 -c "import bcrypt; print(bcrypt.hashpw(b'$PASSWORD', bcrypt.gensalt()).decode())")
Update the password hash in the manifests/tls/dex-passwords.yaml
secret:
sed -i "s|<YOUR_DEX_USER_PASSWORD>|$PASSWORD_HASH|g" manifests/tls/dex-passwords.yaml
Install Kubeflow
- Deploy Kubeflow
cd manifests/
while ! kustomize build tls | kubectl apply --server-side=true -f -; do echo "Retrying to apply resources"; sleep 10; done
--server-side=true
flag helps with large CRDs that may exceed annotation size limits. The retry loop handles dependency ordering issues during installation.
- Once the command has completed, check the pods are ready
kubectl get pods -n cert-manager
kubectl get pods -n istio-system
kubectl get pods -n auth
kubectl get pods -n knative-eventing
kubectl get pods -n knative-serving
kubectl get pods -n kubeflow
kubectl get pods -n kubeflow-user-example-com
Expose the Kubeflow dashboard using Ingress with TLS
There are couple options to expose your Kubeflow cluster with proper HTTPS using Ingress. See note in Kubeflow docs NodePort / LoadBalancer / Ingress In this example we will use the nginx ingress controller which is included as part of the app-routing-system addon in AKS Automatic.
Step 1: Create TLS Certificate
We can create a self-signed certificate for the Kubeflow with IP available on Nginx ingress LoadBalancer or assign DNS Label
Step 1: Find IP or DNS Label of Nginx ingress
- Obtain Nginx IP
NGINX_IP=$(kubectl get svc -n app-routing-system -o jsonpath='{.items[?(@.spec.type=="LoadBalancer")].status.loadBalancer.ingress[0].ip}')
echo "Nginx IP: $NGINX_IP"
- Optional: Use Azure DNS for a friendly URL You can also configure a custom domain name assigned to the Nginx ingress service using Azure DNS:
kubectl annotate service nginx -n app-routing-system \
service.beta.kubernetes.io/azure-dns-label-name=my-kubeflow-cluster
This will make Kubeflow accessible at: my-kubeflow-cluster.$LOCATION.cloudapp.azure.com
Step 2: Create TLS Certificate
If using IP address create following certificate:
echo "apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: kubeflow-tls-cert
namespace: app-routing-system
spec:
secretName: kubeflow-tls-secret
ipAddresses:
- $NGINX_IP
isCA: false
issuerRef:
name: kubeflow-self-signing-issuer
kind: ClusterIssuer
group: cert-manager.io" | kubectl apply -f -
If using DNS label use following definition (replace my-kubeflow-cluster
with your unique dns label)
echo "apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: kubeflow-tls-cert
namespace: app-routing-system
spec:
secretName: kubeflow-tls-secret
dnsNames:
- my-kubeflow-cluster.$LOCATION.cloudapp.azure.com
isCA: false
issuerRef:
name: kubeflow-self-signing-issuer
kind: ClusterIssuer
group: cert-manager.io" | kubectl apply -f -
- Deploy the certificate:
kubectl apply -f tls/certificate.yaml
Wait for the certificate to be ready:
kubectl wait --for=condition=Ready certificate/kubeflow-tls-cert -n istio-system --timeout=300s
Step 3: Configure Ingress
Create and apply an ingress manifest to expose the Kubeflow components:
echo 'apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: kubeflow-ingress
namespace: istio-system
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
nginx.ingress.kubernetes.io/ssl-redirect: "false"
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
nginx.ingress.kubernetes.io/ssl-protocols: "TLSv1.2 TLSv1.3"
spec:
ingressClassName: webapprouting.kubernetes.azure.com
tls:
- secretName: kubeflow-tls-secret
rules:
- http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: istio-ingressgateway
port:
number: 80' | kubectl apply -f -
Verify Ingress:
kubectl get ingress kubeflow-ingress -n istio-system
Wait for the ADDRESS
field to show an external IP address (this may take a few minutes).
NAME CLASS HOSTS ADDRESS PORTS AGE
kubeflow-ingress webapprouting.kubernetes.azure.com * xxx.149.0.222 443 16m
Access Kubeflow Dashboard
You can now access the Kubeflow dashboard at https://$NGINX_IP
or https://my-kubeflow-cluster.$LOCATION.cloudapp.azure.com
if DNS was configured.
Log in using:
- Email: user@example.com (or the email you configured)
- Password: The password you used to generate the hash
Testing the deployment with a Notebook server
You can test that the deployments worked by creating a new Notebook server using the GUI.
- Click on “Create a new Notebook” on the Kubeflow dashboard
- Click on “+ New Notebook” in the top right corner of the resulting page
- Enter a name for the server
- Leave the “jupyterlab” option selected
- Feel free to pick one of the images available, in this case we choose the default
- Set Requested CPU to 0.5 and requested memory in Gi to 1
- Under Data Volumes click on “+ Add new volume”
- Expand the resulting section
- Set the name to datavol-1. The default name provided would not work because it has characters that are not allowed
- Set the size in Gi to 1
- Uncheck “Use default class”
- Choose a class from the provided options. In this case I will choose “azurefile-premium”
- Choose ReadWriteMany as the Access mode. Your data volume config should look like the picture below
- Click on “Launch” at the bottom of the page. A successful deployment should have a green checkmark under status, after 1-2 minutes.
- Click on “Connect” to access your jupyter lab
- Under Notebook, click on Python 3 to access your jupyter notebook and start coding
Destroy the resources
Run the command below to destroy the resources you just created after you are done testing
az group delete -n $RGNAME