What is Kubeflow and How to Deploy it on Kubernetes

in Kubernetes , Machine Learning , MLOps

What is Kubeflow and How to Deploy it on Kubernetes

Deploying and customizing Kubeflow on GCP Cloud using GKE.

    What is Kubeflow?

    Kubeflow is a Kubernetes-native open-source framework for developing, managing, deploying, and running scalable and portable machine learning workloads.

    System development and training are only a small portion of supporting workflows in ML.

    Certain issues to tackle include moving your data to an accessible format and location, data cleaning, feature engineering, evaluating your trained models, handling model versioning, flexibly servicing your trained models, and avoiding skew training/service.

    This is particularly the case when workflows need to be flexible, regularly repeatable and have several moving parts to be incorporated into them. The following diagram, from the official documentation, shows Kubeflow as a platform for arranging the components of your ML system on top of Kubernetes:

    Kubeflow Kubeflow

    In addition, most actions occur over several workflows, or only with separate parameterizations. You also run a series of experiments that need to be carried out in an auditable and repeatable manner. Occasionally part or all of an ML workflow needs to run on-site, but the use of managed cloud resources in other contexts can be more efficient, making it easy to spread, scale-up workflow steps, and run multiple experiments in parallel. Cloud tends to be more cost-effective for "bursty" workloads.

    Deploying Kubeflow to Kubernetes on Google Cloud Platform (GCP)

    In this example, we are going to use GCP and their managed K8s GKE, but if you are using a different provider, there are some minor differences. You can still follow most of this tutorial.

    Set up the GCP project

    Follow these steps

    • In the GCP Console, select an existing project or create a new one.
    • Make sure you have the role of “owner” for the project. The implementation phase establishes different service accounts with roles sufficient to enable seamless integration with GCP services.
    • Make sure that your project is ready for billing. See the Billing Settings Guide to Change a Project.
    • Go to the following pages on the GCP Console and make sure you allow the required APIs:
      • Compute Engine API
      • Kubernetes Engine API
      • Identity and Access Management (IAM) API
      • Deployment Manager API
      • Cloud Resource Manager API
      • Cloud Filestore API
      • AI Platform Training & Prediction API
    • Whether you are using the GCP Free Tier or the $300 credit 12-month trial period, note that you can't run Kubeflow 's default GCP version because the free tier doesn't have enough space. You have to move to a payable account.

    Deploy kubeFlow using the CLI

    Before installing Kubeflow on the command line:

    • Ensure you have installed the following tools:
    • Make sure that your GCP project meets the minimum requirements from GCP documentation.

    Prepare your environment

    We suppose that you already have a GKE cluster running and that you have access to. Otherwise, start by deploying one:

    gcloud container clusters create cluster-name --zone compute-zone

    You can get more information about the same command in the official documentation.

    Follow these steps to download the kfctl binary for the Kubeflow CLI:

    tar -xvf kfctl_v1.0.2_<platform>.tar.gz

    • Log in. You only need to run this command once:

    gcloud auth login

    • Create user credentials. You only need to run this command once:

    gcloud auth application-default login

    • Configure gcloud default values for the zone and the project

    # Set your GCP project ID and the zone where you want to create
    # the Kubeflow deployment:
    export PROJECT=<your GCP project ID>
    export ZONE=<your GCP zone>

    gcloud config set project ${PROJECT}
    gcloud config set compute/zone ${ZONE}

    • Select the KFDef spec to use for your deployment

    export CONFIG_URI="https://raw.githubusercontent.com/kubeflow/manifests/v1.0-branch/kfdef/kfctl_gcp_iap.v1.0.2.yaml"

    • Create environment variables containing the OAuth client ID and secret that you created earlier

    export CLIENT_ID=<CLIENT_ID from OAuth page>
    export CLIENT_SECRET=<CLIENT_SECRET from OAuth page>

    • The CLIENT_ID and CLIENT_SECRET can be obtained from the Cloud Console by selecting APIs & Services -> Credentials
    • Pick a name KF_NAME  for your Kubeflow deployment and directory for your configuration.

    export KF_NAME=<your choice of name for the Kubeflow deployment>
    export BASE_DIR=<path to a base directory>
    export KF_DIR=${BASE_DIR}/${KF_NAME}

    • To deploy Kubeflow using the default settings, run the kfctl apply command:

    mkdir -p ${KF_DIR}
    cd ${KF_DIR}
    kfctl apply -V -f ${CONFIG_URI}

    kfctl will try to populate the KFDef spec with various defaults automatically

    Customizing your Kubeflow deployment

    The process outlined in the previous steps configures Kubeflow with various defaults. You can follow the instructions below to have greater control.

    • Download the KFDef file to your local directory to allow modifications

    mkdir -p ${KF_DIR}
    cd ${KF_DIR}
    curl -L -o ${CONFIG_FILE} https://raw.githubusercontent.com/kubeflow/manifests/v1.0-branch/kfdef/kfctl_gcp_iap.v1.0.2.yaml

    • CONFIG_FILE should be the name you would like to use for your local config file; e.g. “kfdef.yaml”
    • Run the kfctl build command to generate configuration files for your deployment:

    cd ${KF_DIR}
    kfctl build -V -f ${CONFIG_FILE}

    • Run the “kfctl apply” command to deploy Kubeflow

    kfctl apply -V -f ${CONFIG_FILE}

    Customizing the installation

    What we have seen until now is a basic installation, but it’s possible to customize it using kustomize. Kustomize introduces a template-free way to customize application configuration that simplifies the use of off-the-shelf applications. It traverses a Kubernetes manifest to add, remove, or update configuration options without forking.

    Adding GPU nodes to your cluster is, for instance, something you can control.  You can also add GPU node pool to an existing cluster, add Cloud TPUs to your cluster, add VMs with more CPUs or RAM  or add users to Kubeflow. You can read about more possible customization options here.


    We showed how to deploy a basic Kubflow cluster, but if you want more customization, deployment can be tricky. Nevertheless, a containerized, cloud-based machine learning workflow managed by Kubernetes like Kubeflow solves many of the challenges raised by machine learning computational requirements. It provides scalable access to CPUs and GPUs, which automatically ramps up to spikes when computing needs to be done. It provides access to data storage for multiple teams, which grows to meet their needs and supports the tools they rely on to manipulate datasets.

    A layer of abstraction enables data scientists to access resources without thinking about the network information underneath. As more teams try to use machine learning to extract sense from their data, Kubernetes facilitates it for them.


    Deploy using kubectl and kpt

    Setup Kubeflow Cluster in a Shared VPC on Google Cloud Platform

    Introduction to Kubeflow

    Get similar stories in your inbox weekly, for free

    Share this story:

    Founder and CEO of Cloudplex - We make Kubernetes easy for developers.

    Latest stories

    How ManageEngine Applications Manager Can Help Overcome Challenges In Kubernetes Monitoring

    We tested ManageEngine Applications Manager to monitor different Kubernetes clusters. This post shares our review …

    AIOps with Site24x7: Maximizing Efficiency at an Affordable Cost

    In this post we'll dive deep into integrating AIOps in your business suing Site24x7 to …

    A Review of Zoho ManageEngine

    Zoho Corp., formerly known as AdventNet Inc., has established itself as a major player in …

    Should I learn Java in 2023? A Practical Guide

    Java is one of the most widely used programming languages in the world. It has …

    The fastest way to ramp up on DevOps

    You probably have been thinking of moving to DevOps or learning DevOps as a beginner. …

    Why You Need a Blockchain Node Provider

    In this article, we briefly cover the concept of blockchain nodes provider and explain why …

    Top 5 Virtual desktop Provides in 2022

    Here are the top 5 virtual desktop providers who offer a range of benefits such …

    Why Your Business Should Connect Directly To Your Cloud

    Today, companies make the most use of cloud technology regardless of their size and sector. …

    7 Must-Watch DevSecOps Videos

    Security is a crucial part of application development and DevSecOps makes it easy and continuous.The …