What is Kubeflow and How to Deploy it on Kubernetes

in Kubernetes , Machine Learning , MLOps

Deploying and customizing Kubeflow on GCP Cloud using GKE.

What is Kubeflow?

Kubeflow is a Kubernetes-native open-source framework for developing, managing, deploying, and running scalable and portable machine learning workloads.

System development and training are only a small portion of supporting workflows in ML.

Certain issues to tackle include moving your data to an accessible format and location, data cleaning, feature engineering, evaluating your trained models, handling model versioning, flexibly servicing your trained models, and avoiding skew training/service.

This is particularly the case when workflows need to be flexible, regularly repeatable and have several moving parts to be incorporated into them. The following diagram, from the official documentation, shows Kubeflow as a platform for arranging the components of your ML system on top of Kubernetes:

Kubeflow

In addition, most actions occur over several workflows, or only with separate parameterizations. You also run a series of experiments that need to be carried out in an auditable and repeatable manner. Occasionally part or all of an ML workflow needs to run on-site, but the use of managed cloud resources in other contexts can be more efficient, making it easy to spread, scale-up workflow steps, and run multiple experiments in parallel. Cloud tends to be more cost-effective for "bursty" workloads.

Deploying Kubeflow to Kubernetes on Google Cloud Platform (GCP)

In this example, we are going to use GCP and their managed K8s GKE, but if you are using a different provider, there are some minor differences. You can still follow most of this tutorial.

Set up the GCP project

Follow these steps

In the GCP Console, select an existing project or create a new one.
Make sure you have the role of “owner” for the project. The implementation phase establishes different service accounts with roles sufficient to enable seamless integration with GCP services.
Make sure that your project is ready for billing. See the Billing Settings Guide to Change a Project.
Go to the following pages on the GCP Console and make sure you allow the required APIs:
- Compute Engine API
- Kubernetes Engine API
- Identity and Access Management (IAM) API
- Deployment Manager API
- Cloud Resource Manager API
- Cloud Filestore API
- AI Platform Training & Prediction API
Whether you are using the GCP Free Tier or the $300 credit 12-month trial period, note that you can't run Kubeflow 's default GCP version because the free tier doesn't have enough space. You have to move to a payable account.

Deploy kubeFlow using the CLI

Before installing Kubeflow on the command line:

Ensure you have installed the following tools:
- kubectl.
- gcloud.
Make sure that your GCP project meets the minimum requirements from GCP documentation.

Prepare your environment

We suppose that you already have a GKE cluster running and that you have access to. Otherwise, start by deploying one:

gcloud container clusters create cluster-name --zone compute-zone

You can get more information about the same command in the official documentation.

Follow these steps to download the kfctl binary for the Kubeflow CLI:

Download the kfctl v1.0.2 release from the kfctl releases page
Unpack the tarball:

tar -xvf kfctl_v1.0.2_<platform>.tar.gz

gcloud auth login

Create user credentials. You only need to run this command once:

gcloud auth application-default login

Configure gcloud default values for the zone and the project

# Set your GCP project ID and the zone where you want to create # the Kubeflow deployment: export PROJECT=<your GCP project ID> export ZONE=<your GCP zone>

gcloud config set project ${PROJECT} gcloud config set compute/zone ${ZONE}

Select the KFDef spec to use for your deployment

export CONFIG_URI="https://raw.githubusercontent.com/kubeflow/manifests/v1.0-branch/kfdef/kfctl_gcp_iap.v1.0.2.yaml"

Create environment variables containing the OAuth client ID and secret that you created earlier

export CLIENT_ID=<CLIENT_ID from OAuth page> export CLIENT_SECRET=<CLIENT_SECRET from OAuth page>

The CLIENT_ID and CLIENT_SECRET can be obtained from the Cloud Console by selecting APIs & Services -> Credentials
Pick a name KF_NAME for your Kubeflow deployment and directory for your configuration.

export KF_NAME=<your choice of name for the Kubeflow deployment> export BASE_DIR=<path to a base directory> export KF_DIR=${BASE_DIR}/${KF_NAME}

To deploy Kubeflow using the default settings, run the kfctl apply command:

mkdir -p ${KF_DIR} cd ${KF_DIR} kfctl apply -V -f ${CONFIG_URI}

kfctl will try to populate the KFDef spec with various defaults automatically

Customizing your Kubeflow deployment

The process outlined in the previous steps configures Kubeflow with various defaults. You can follow the instructions below to have greater control.

Download the KFDef file to your local directory to allow modifications

mkdir -p ${KF_DIR} cd ${KF_DIR} curl -L -o ${CONFIG_FILE} https://raw.githubusercontent.com/kubeflow/manifests/v1.0-branch/kfdef/kfctl_gcp_iap.v1.0.2.yaml

CONFIG_FILE should be the name you would like to use for your local config file; e.g. “kfdef.yaml”
Run the kfctl build command to generate configuration files for your deployment:

cd ${KF_DIR} kfctl build -V -f ${CONFIG_FILE}

Run the “kfctl apply” command to deploy Kubeflow

kfctl apply -V -f ${CONFIG_FILE}

Customizing the installation

What we have seen until now is a basic installation, but it’s possible to customize it using kustomize. Kustomize introduces a template-free way to customize application configuration that simplifies the use of off-the-shelf applications. It traverses a Kubernetes manifest to add, remove, or update configuration options without forking.

Adding GPU nodes to your cluster is, for instance, something you can control. You can also add GPU node pool to an existing cluster, add Cloud TPUs to your cluster, add VMs with more CPUs or RAM or add users to Kubeflow. You can read about more possible customization options here.

Conclusion

We showed how to deploy a basic Kubflow cluster, but if you want more customization, deployment can be tricky. Nevertheless, a containerized, cloud-based machine learning workflow managed by Kubernetes like Kubeflow solves many of the challenges raised by machine learning computational requirements. It provides scalable access to CPUs and GPUs, which automatically ramps up to spikes when computing needs to be done. It provides access to data storage for multiple teams, which grows to meet their needs and supports the tools they rely on to manipulate datasets.

A layer of abstraction enables data scientists to access resources without thinking about the network information underneath. As more teams try to use machine learning to extract sense from their data, Kubernetes facilitates it for them.

References:

Deploy using kubectl and kpt

Setup Kubeflow Cluster in a Shared VPC on Google Cloud Platform

Introduction to Kubeflow