This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Deployment Options
Deploy Kubeflow into AKS
Start by checking out the prerequisites page
If you want to deploy Kubeflow with minimal changes on AKS, then consider the vanilla deployment option. The Kubeflow control plane is installed on Azure Kubernetes Service (AKS), which is a managed container service used to run and scale Kubernetes applications in the cloud.
For a more secure deployment option that is has minimum baseline security, then consider the deploy with custom password and TLS deployment option.
1 - Deploy Kubeflow with Password, Ingress and TLS
Deploying Kubeflow on AKS with Custom Password and TLS
Background
In this lab you will deploy an Azure Kubernetes Service (AKS) cluster and other Azure services (Container Registry, Managed Identity, Key Vault) with Azure CLI and Bicep. You will then install Kubeflow after creating custom Password. This deployment option will also make use of TLS with a self-signed certificate and an ingress controller. Swap out this self signed certificate with your own CA certs for production workloads.
Deploy Kubeflow with Password, Ingress and TLS
⚠️ Warning: In order to complete this deployment, you will need to have either User Access Admin
and Contributor
or Owner
access to the subscription you are deploying into.
Use the Azure CLI and Bicep templates to deploy the infrastructure for your application. We will be using the AKS construction project to rapidly deploy the required Azure resources. The project allows users the flexibility to tweak their AKS environment however they want. Please check out the AKS construction helper for more details about AKS construction.
You can also try out the automated option using Mage build tool at the Azure Open Source Labs.
Login to the Azure CLI.
💡Note: If you have access to multiple subscriptions, you may need to run the following command to work with the appropriate subscription: az account set --subscription <NAME_OR_ID_OF_SUBSCRIPTION>
.
Install kubectl using the Azure CLI, if required.
Clone this repo which includes the Azure/AKS-Construction and kubeflow/manifests repos as Git Submodules
git clone --recurse-submodules https://github.com/Azure/kubeflow-aks.git
Change directory into the newly cloned directory
Deployment steps
Get the signed in user id so that you can get admin access to the cluster you create
SIGNEDINUSER=$(az ad signed-in-user show --query id --out tsv)
RGNAME=kubeflow
Create deployment
az group create -n $RGNAME -l eastus
DEP=$(az deployment group create -g $RGNAME --parameters signedinuser=$SIGNEDINUSER -f main.bicep -o json)
💡Note: The DEP variable is very important and will be used in subsequent steps. You can save it by running echo $DEP > test.json
and restore it by running export DEP=$(cat test.json)
.
KVNAME=$(echo $DEP | jq -r '.properties.outputs.kvAppName.value')
AKSCLUSTER=$(echo $DEP | jq -r '.properties.outputs.aksClusterName.value')
TENANTID=$(az account show --query tenantId -o tsv)
ACRNAME=$(az acr list -g $RGNAME --query "[0].name" -o tsv)
Install kubelogin and log into the cluster
Next install kubelogin using the installation instructions appropriate for your computer. From there, you’ll need to run the following commands to download the kubeconfig file and convert it for use with kubelogin.
az aks get-credentials --resource-group $RGNAME \
--name $AKSCLUSTER
kubelogin convert-kubeconfig -l azurecli
Log in to the cluster. Enter your Azure credentials when prompted afterwards to complete the login. If this is successful, kubectl should return a list of nodes.
⚠️ Warning: It is important that you log into the cluster at this point to avoid running into issues at a later point.
Install kustomize
Next install kustomize using the installation instructions appropriate for your computer.
💡Note: In order to use the
kustomize
command below to deploy Kubeflow, you must use
Kustomize v3.2.0. More info
here.
Install Kubeflow with TLS and Unique Password
Please note that a self-signed certificate is used for demonstration purposes. Do not use self signed certs for production workloads. You can easily swap this self-signed cert with your CA certificate for your usecase.
⚠️ Warning: For this deployment, we will be using a simple method for authenticating to Kubeflow. For more advanced usecases, please configure your deployment to use Azure AD.
-
The first step is to generate a new Hash/Password combination using bycrypt. There are many ways of doing this, eg by generating it using python. For simplicity we will be using coderstool’s Bycrypt Hash Generator for testing purposes. Do not do this for production workloads. In the plain text field, enter a password for your first user, then click on the “Generate Hash” button. You can generate multiple if you have multiple users.
-
Head to the deployments/tls/dex-config-map.yaml file and update the hash value there (around line 22) with the hash you just generated. You can also change the email address, username and userid. In addition, you can setup multiple users by adding more users to the array. Please update the default email address in the params file located at manifests\common\user-namespace\base\params.env file if changed from default.
-
Update your auth.md file with the new email address and password (plain text password not hash) or store the secrets in a more secure way
-
Copy the contents of this newly updated manifests folder to the kubeflow manifests folder. This will update the files so the deployment includes your config changes.
cp -a deployments/tls manifests/tls
-
cd to the manifests folder and install kubeflow
Install all of the components via a single command
while ! kustomize build tls | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done
-
Once the command has completed, check the pods are ready
kubectl get pods -n cert-manager
kubectl get pods -n istio-system
kubectl get pods -n auth
kubectl get pods -n knative-eventing
kubectl get pods -n knative-serving
kubectl get pods -n kubeflow
kubectl get pods -n kubeflow-user-example-com
-
Restart dex to ensure dex is using the updated password
⚠️ Warning: It is important that you restart the dex pod by running the command below. If you don’t any previous password (including the default password 12341234 if not changed) will be used from the time the Service is exposed via LoadBalancer
until the time this command is run or the dex is otherwise restarted.
kubectl rollout restart deployment dex -n auth
-
Configure TLS. Start by getting IP address of istio gateway
kubectl -n istio-system get service istio-ingressgateway --output jsonpath={.status.loadBalancer.ingress[0].ip}
Replace the IP address in the deployments/tls/certificate.yaml file (line 13) with the IP address of the istio gateway and save the file.
-
Please note that instead of providing the IP address like we did above, you could give the LoadBalancer an Azure sub-domain (via the annotation in manifests/common/istio-1-16/istio-install/base/patches/service.yaml ) and use that too. Deploy the certificate manifest file.
kubectl apply -f tls-manifest/certificate.yaml
-
You have completed the deployment. Access the dashboard by entering the IP address in a browser. You might get a warning saying the connection is unsafe. This is expected since you are using a self signed certificate. Click on advanced and proceed to the URL to view your dashboard. Log in using the email address and password in the auth.md file (assuming you updated it with your email address and password in the previous step)
Testing the deployment with a Notebook server
You can test that the deployments worked by creating a new Notebook server using the GUI.
- Click on “Create a new Notebook server”
- Click on “+ New Notebook” in the top right corner of the resulting page
- Enter a name for the server
- Leave the “jupyterlab” option selected
- Feel free to pick one of the images available, in this case we choose the default
- Set Requested CPU to 0.5 and requested memory in Gi to 1
- Under Data Volumes click on “+ Add new volume”
- Expand the resulting section
- Set the name to datavol-1. The default name provided would not work because it has characters that are not allowed
- Set the size in Gi to 1
- Uncheck “Use default class”
- Choose a class from the provided options. In this case I will choose “azurefile-premium”
- Choose ReadWriteMany as the Access mode. Your data volume config should look like the picture below
- Click on “Launch” at the bottom of the page. A successful deployment should have a green checkmark under status, after 1-2 minutes.
- Click on “Connect” to access your jupyter lab
- Under Notebook, click on Python 3 to access your jupyter notebook and start coding
Destroy the resources
Run the command below to destroy the resources you just created after you are done testing
az group delete -n $RGNAME
2 - Prerequisties
Set up your environment for deploying Kubeflow for AKS
Kubeflow on AKS Prerequisites
For all Kubeflow on AKS deployment options, you will need the following
If you have access to
GitHub Codespaces or
Docker Desktop on your local machine, it is highly recommended that you deploy this using a
devcontainer as it includes all the tools you need. The configuration for the devcontainer can be found
here.
3 - Vanilla Installation
Deploy kubeflow into an AKS cluster using default settings.
Background
In this lab you will deploy an Azure Kubernetes Service (AKS) cluster and other Azure services (Container Registry, Managed Identity, Key Vault) with Azure CLI and Bicep. You will then install Kubeflow using the default settings using Kustomize and create a jupyter notebook server you can easily access on your browser.
Instructions for Basic Deployment without TLS and with Default Password
This deployment option is for testing only. To deploy with TLS, and change default password, please click here: Deploy kubeflow with TLS.
⚠️ Warning: This deployment option would require users to have access to the kubernetes cluster. For a better deployment option that doesn’t have this restriction, uses TLS and shows how to change default password, please head to the [Deploy kubeflow with TLS] option.
Use the Azure CLI and Bicep templates to deploy the infrastructure for your application. We will be using the AKS construction project to rapidly deploy the required Azure resources. The project allows users the flexibility to tweak their AKS environment however they want. Please check out the AKS construction helper for more details about AKS construction.
Login to the Azure CLI.
💡Note: If you have access to multiple subscriptions, you may need to run the following command to work with the appropriate subscription: az account set --subscription <NAME_OR_ID_OF_SUBSCRIPTION>
.
Install kubectl using the Azure CLI, if required.
Clone this repo which includes the Azure/AKS-Construction and kubeflow/manifests repos as Git Submodules
git clone --recurse-submodules https://github.com/Azure/kubeflow-aks.git
Change directory into the newly cloned directory
Deployment steps
⚠️ Warning: In order to complete this deployment, you will need to have either User Access Admin
and Contributor
or Owner
access to the subscription you are deploying into.
Get the signed in user id so that you can get admin access to the cluster you create
SIGNEDINUSER=$(az ad signed-in-user show --query id --out tsv)
RGNAME=kubeflow
Create deployment
az group create -n $RGNAME -l eastus
DEP=$(az deployment group create -g $RGNAME --parameters signedinuser=$SIGNEDINUSER -f main.bicep -o json)
💡Note: The DEP variable is very important and will be used in subsequent steps. You can save it by running echo $DEP > test.json
and restore it by running export DEP=$(cat test.json)
.
KVNAME=$(echo $DEP | jq -r '.properties.outputs.kvAppName.value')
AKSCLUSTER=$(echo $DEP | jq -r '.properties.outputs.aksClusterName.value')
TENANTID=$(az account show --query tenantId -o tsv)
ACRNAME=$(az acr list -g $RGNAME --query "[0].name" -o tsv)
Install kubelogin and log into the cluster
Next install kubelogin using the installation instructions appropriate for your computer. From there, you’ll need to run the following commands to download the kubeconfig file and convert it for use with kubelogin.
az aks get-credentials --resource-group $RGNAME \
--name $AKSCLUSTER
kubelogin convert-kubeconfig -l azurecli
Log in to the cluster. Enter your Azure credentials when prompted afterwards to complete the login. If this is successful, kubectl should return a list of nodes.
⚠️ Warning: It is important that you log into the cluster at this point to avoid running into issues at a later point.
Install kustomize
Next install kustomize using the installation instructions appropriate for your computer.
💡Note: In order to use the
kustomize
command below to deploy Kubeflow, you must use
Kustomize v3.2.0. More info
here.
Deploy Kubeflow without TLS using Default Password
This deployment option is for testing only. To deploy with TLS, and change default password, please click here: Deploy kubeflow with TLS.
From the root of the repo, cd
into kubeflow’s manifests
directory and make sure you are in the v1.7-branch
.
cd manifests/
git checkout v1.7-branch
cd ..
Install all of the components via a single command
cp -a deployments/vanilla manifests/vanilla
cd manifests/
while ! kustomize build vanilla | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done
Once the command has completed, check the pods are ready
kubectl get pods -n cert-manager
kubectl get pods -n istio-system
kubectl get pods -n auth
kubectl get pods -n knative-eventing
kubectl get pods -n knative-serving
kubectl get pods -n kubeflow
kubectl get pods -n kubeflow-user-example-com
Run kubectl port-forward
to access the Kubeflow dashboard
kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80
Finally, open http://localhost:8080 and login with the default user’s credentials. The default email address is user@example.com
and the default password is 12341234
Testing the deployment with a Notebook server
You can test that the deployments worked by creating a new Notebook server using the GUI.
-
Click on “Create a new Notebook server”
-
Click on “+ New Notebook” in the top right corner of the resulting page
-
Enter a name for the server
-
Leave the “jupyterlab” option selected
-
Feel free to pick one of the images available, in this case we choose the default
-
Set Requested CPU to 0.5 and requested memory in Gi to 1
-
Under Data Volumes click on “+ Add new volume”
-
Expand the resulting section
-
Set the name to datavol-1. The default name provided would not work because it has characters that are not allowed
-
Set the size in Gi to 1
-
Uncheck “Use default class”
-
Choose a class from the provided options. In this case I will choose “azurefile-premium”
-
Choose ReadWriteMany as the Access mode. Your data volume config should look like the picture below
-
Click on “Launch” at the bottom of the page. A successful deployment should have a green checkmark under status, after 1-2 minutes.
-
Click on “Connect” to access your jupyter lab
-
Under Notebook, click on Python 3 to access your jupyter notebook and start coding
Next steps
[Secure your kubeflow cluster using TLS and stronger Password] deployment option.