Getting Started

Overview

_images/DKube_Hierarchy_Diagram.png

DKubeTM is a portable, end-to-end, Kubeflow-based MLOps platform that enables data scientists to develop, tune, and deploy complex models. It is based on Kubernetes, and will run on-premises and on the most popular cloud Platforms. It has the same look, feel, and workflow on all of them, and migrating back and forth between providers is fast and simple.

This guide describes the process of installing and managing DKube on a cluster. It is assumed that a supported version of Kubernetes is installed on the cluster ( Prerequisites ) prior to installing DKube.

DKube Configuration

The cluster can include one or more master nodes and optional worker nodes.

  • The Master node coordinates the cluster, and can optionally contain GPUs

  • Each Worker node provides more resources, and is a way to expand the capability of the cluster

At least 1 Master node must be running for the cluster to be active. Worker nodes can be added and removed, and the cluster will continue to operate. The process to stop and restart the cluster is described in the section Restarting DKube After Cluster Restart

Installation Configuration

The installation can be run:

  • From the master node in the cluster, or

  • From a remote node that is not part of the cluster

The overall flow of installation is as follows:

  • Copy the required files to the installation node through Docker

  • Ensure that the installation node has passwordless access to all of the DKube cluster nodes

  • Execute the platform-specific setup steps as described in this document

  • Install DKube using Helm

  • Access DKube through a browser

Important

Even if the installation is executed from the master node on the cluster, passwordless access is still required to all of the nodes on the cluster, including the master node

DKube and Kubernetes

DKube requires Kubernetes to operate. This guide assumes that a supported version of Kubernetes has been installed on the cluster, as listed in the prerequisites section.

Prerequisites

Supported Platforms

  • The following OSs are supported:

  • Ubuntu 18.04

  • CentOS 7.9

  • Rook Ceph 1.4

  • Cluster nodes can include one of the following:

  • On-prem (bare metal or VM)

  • Google GCP

  • Amazon AWS

  • Microsoft Azure

  • The following Kubernetes platforms and versions are supported:

  • Amazon EKS

  • Rancher 2.4

  • Kubernetes 1.18

  • VMWare vSphere with Tanzu 1.2.1

Node Requirements

Installation Node Requirements

The installation node has the following requirements:

  • A supported operating system

  • Docker CE

  • Kubectl

Software help to install some of the required packages is provided at Software Package Help

DKube Cluster Node Requirements

The DKube Cluster nodes have the following requirements:

  • A supported operating system

  • Docker CE

  • Nodes should all have static IP addresses, even if the VM exists on a cloud

  • All nodes must be on the same subnet

  • All nodes must have the same user name and ssh key

Each node on the cluster should have the following minimum resources:

  • 16 CPU cores

  • 64GB RAM

  • Storage size is dependent on the programs and datasets, and should be large enough to handle the required data, but should be at least 400GB

Important

Only GPUs of the exact same type can be installed on a node. So, for example, you cannot mix an NVIDIA V100 and P100 on the same node. And even GPUs of the same class must have the same configuration (e.g. memory).

Important

The Nouveau driver should not be installed on any of the nodes in the cluster. If the driver is installed, you can follow the instructions in the section Removing Nouveau Driver

Access to the Cluster

In order to run DKube both during and after installation, a minimum level of security access must be provided from any system that needs to use the node. This includes access to the url in order to open DKube from a browser.

Protocol

Port Range

Source

TCP

30002

Access IP

TCP

32222

Access IP

TCP

32223

Access IP

TCP

32323

Access IP

TCP

32224

Access IP

TCP

32225

Access IP

TCP

6443

Access IP

TCP

443

Access IP

TCP

22

Access IP

All

0-65535

Private Subnet

ICMP

0-65535

Access IP

The source IP access range is in CIDR format. It consists of an IP address and mask combination. For example:

  • 192.168.100.14/24 would allow IP addresses in the range 192.168.100.x

  • 192.168.100.14/16 would allow IP addresses in the range 192.168.x.x

Getting the DKube Files

The files necessary for installation are pulled from Docker, using the following commands:

sudo docker login -u <Docker username> Password: <Docker password> sudo docker run --rm -it -v $HOME/.dkube:/root/.dkube ocdr/dkubeadm:<DKube version> init

Note

The docker credentials and DKube version number (x.y.z) are provided separately

This will create the folder $HOME/.dkube and copy the necessary files to the folder.

Note

The specific tools and files are used based on the platform-specific instructions described in this document

Platform-Specific Installation Instructions

Based on the platform and Kubernetes type, specific setup is required prior to installing DKube.

Kubernetes

Instructions

EKS

Setting up an Amazon EKS Cluster

Rancher

Setting Up a Rancher Cluster

Tanzu

Setting up a VMWare Tanzu Cluster