DKube SDK Developer Guide

The document is guide for developers to build ML applications on DKube platform. DKube SDK is repository of abstract python classes and libraries which can be used by client side applications to interface with Dkube platform.

How to install

Python >=python3.5 is required

Install using pip from git using below command:

sudo pip install git+https://github.com/oneconvergence/dkube.git@2.2 or
sudo pip3 install git+https://github.com/oneconvergence/dkube.git@2.2

It will install all the prerequisites in requirements.txt

SDK API

class dkube.sdk.api.DkubeApi(URL=None, token=None, common_tags=[], req_timeout=None, req_retries=None)[source]

This class encapsules all the high level dkube workflow functions.:

from dkube.sdk import *
dapi = DkubeApi()

Inputs

URL

FQDN endpoint at which DKube platform is deployed:

http://dkube-controller-master.dkube.cluster.local:5000

https://dkube.ai:32222

Note

If not provided then the value is picked from DKUBE_ACCESS_URL env variable. If not found then http://dkube-controller-master.dkube.cluster.local:5000 is used assuming the access is internal to the DKube cluster

token

Access token for the APIs, without which DKube will return 40x codes

Note

If not provided then the value is picked from DKUBE_ACCESS_TOKEN env variable. ASSERTs if env is not defined.

common_tags

Tags which need to applied all the resources created using this API object

req_timeout

Timeout for all the requests which are issued using this API object

req_retries

Number of retries per request

commit_featureset(**kwargs)[source]

Method to commit sticky featuresets.

featureset should be in ready state. It will be in created state if no featurespec is uploaded. If the featureset is in created state, the following will happen.

  1. If metadata is passed, it will be uploaded as featurespec

  2. If no metadata is passed, it derives from df and uploads it.

If the featureset is in ready state, the following will happen.

  1. metadata if passed any will be ignored

  2. featurespec will be downloaded for the specifed featureset and df is validated for conformance.

If name is specified, it derives the path for committing the features.

If path is also specified, it doesn’t derive the path. It uses the specified path. However, path should a mount path into dkube store.

If df is not specified, it assumes the df is already written to the featureset path. Features can be written to featureset mount path using DkubeFeatureSet.write_features

Available in DKube Release: 2.2

Inputs

name

featureset name or None example: name=’fset’

df

Dataframe with features to be written None or empty df are invalid type: pandas.DataFrame

metadata

optional yaml object with name, description and schema fields or None example:metadata=[{‘name’:age, ‘description:’’, ‘schema’:int64}]

path

Mount path where featureset is mounted or None example: path=’/opt/dkube/fset’

Outputs

Dictionary with response status

create_code(code: dkube.sdk.rsrcs.code.DkubeCode, wait_for_completion=True)[source]

Method to create a code repo on DKube. Raises Exception in case of errors.

Inputs

code

Instance of dkube.sdk.rsrcs.code class. Please see the Resources section for details on this class.

wait_for_completion

When set to True this method will wait for code resource to get into one of the complete state. code is declared complete if it is one of the complete/failed/error state

create_dataset(dataset: dkube.sdk.rsrcs.dataset.DkubeDataset, wait_for_completion=True)[source]

Method to create a dataset on DKube. Raises Exception in case of errors.

Inputs

dataset

Instance of dkube.sdk.rsrcs.dataset class. Please see the Resources section for details on this class.

wait_for_completion

When set to True this method will wait for dataset resource to get into one of the complete state. dataset is declared complete if it is one of the complete/failed/error state

create_featureset(featureset: dkube.sdk.rsrcs.featureset.DkubeFeatureSet, wait_for_completion=True)[source]

Method to create a featureset on DKube.

Available in DKube Release: 2.2

Inputs

featureset

Instance of dkube.sdk.rsrcs.featureSet class. Please see the Resources section for details on this class.

wait_for_completion

When set to True this method will wait for featureset resource to be ready or created with v1 version in sync state

Outputs

A dictionary object with response status

create_model(model: dkube.sdk.rsrcs.model.DkubeModel, wait_for_completion=True)[source]

Method to create a model on DKube. Raises Exception in case of errors.

Inputs

model

Instance of dkube.sdk.rsrcs.model class. Please see the Resources section for details on this class.

wait_for_completion

When set to True this method will wait for model resource to get into one of the complete state. model is declared complete if it is one of the complete/failed/error state

create_model_deployment(user, name, model, version, description=None, stage_or_deploy='stage', min_replicas=0, max_concurrent_requests=0, wait_for_completion=True)[source]

Method to create a serving deployment for a model in the model catalog. Raises Exception in case of errors.

Inputs

user

Name of the user creating the deployment

name

Name of the deployment. Must be unique

description

User readable description of the deployment

model

Name of the model to be deployed

version

Version of the model to be deployed

stage_or_deploy

Default set to :bash: stage which means to stage the model deployment for testing before deploying it for production. Change to :bash: deploy to deploy the model in production

min_replicas

Minimum number of replicas that each Revision should have. If not prvided, uses value set in platform config map.

max_concurrent_requests

Soft limit that specifies the maximum number of requests an inf pod can process at a time. If not prvided, uses value set in platform config map.

wait_for_completion

When set to True this method will wait for job to complete after submission. Job is declared complete if it is one of the complete/failed/error state

create_preprocessing_run(run: dkube.sdk.rsrcs.preprocessing.DkubePreprocessing, wait_for_completion=True)[source]

Method to create a preprocessing run on DKube. Raises Exception in case of errors.

Inputs

run

Instance of dkube.sdk.rsrcs.Preprocessing class. Please see the Resources section for details on this class.

wait_for_completion

When set to True this method will wait for job to complete after submission. Job is declared complete if it is one of the complete/failed/error state

create_project(project: dkube.sdk.rsrcs.project.DkubeProject)[source]

Creates DKube Project.

Available in DKube Release: 2.2

Inputs

project

instance of dkube.sdk.rsrcs.DkubeProject class.

create_test_inference(run: dkube.sdk.rsrcs.serving.DkubeServing, wait_for_completion=True)[source]

Method to create a test inference on DKube. Raises Exception in case of errors.

Inputs

run

Instance of dkube.sdk.rsrcs.serving class. Please see the Resources section for details on this class.

If serving image is not updated in run:DkubeServing argument then, - If training used supported standard framework, dkube will pick approp serving image - If training used custom image, dkube will try to use the same image for serving

If transformer image is not updated in run:DkubeServing then, - Dkube will use same image as training image

If transformer code is not updated in run:DkubeServing then, - Dkube will use the code used for training

wait_for_completion

When set to True this method will wait for job to complete after submission. Job is declared complete if it is one of the complete/failed/error state

create_training_run(run: dkube.sdk.rsrcs.training.DkubeTraining, wait_for_completion=True)[source]

Method to create a training run on DKube. Raises Exception in case of errors.

Inputs

run

Instance of dkube.sdk.rsrcs.Training class. Please see the Resources section for details on this class.

wait_for_completion

When set to True this method will wait for job to complete after submission. Job is declared complete if it is one of the complete/failed/error state

delete_code(user, name, force=False)[source]

Method to delete a code repo. Raises exception if token is of different user or if code with name doesnt exist or on any connection errors.

Inputs

user

The token must belong to this user. As code of different user cannot be deleted.

name

Name of the code which needs to be deleted.

delete_dataset(user, name, force=False)[source]

Method to delete a dataset. Raises exception if token is of different user or if dataset with name doesnt exist or on any connection errors.

Inputs

user

The token must belong to this user. As dataset of different user cannot be deleted.

name

Name of the dataset which needs to be deleted.

delete_featureset(name)[source]

Method to delete a a featureset.

Available in DKube Release: 2.2

Inputs

name

featureset name to be deleted. example: “mnist-fs”

Outputs

A dictionary object with response status and the deleted featureset name

delete_featuresets(featureset_list)[source]

Method to delete a list of featuresets on DKube. Raises Exception in case of errors.

Available in DKube Release: 2.2

Inputs

featureset_list

list of featureset names example: [“mnist-fs”, “titanic-fs”]

Outputs

A dictionary object with response status with the list of deleted featureset names

delete_ide(user, name, wait_for_completion=True)[source]

Method to delete an IDE. Raises exception if token is of different user or if IDE with name doesnt exist or on any connection errors.

Inputs

user

The token must belong to this user. As IDE instance of different user cannot be deleted.

name

Name of the IDE which needs to be deleted.

wait_for_completion

When set to True this method will wait for ide to get deleted.

delete_model(user, name, force=False)[source]

Method to delete a model. Raises exception if token is of different user or if model with name doesnt exist or on any connection errors.

Inputs

user

The token must belong to this user. As model of different user cannot be deleted.

name

Name of the model which needs to be deleted.

delete_model_deployment(user, name, wait_for_completion=True)[source]

Method to delete a model deployment. Raises exception if token is of different user or if serving run with name doesnt exist or on any connection errors.

Inputs

user

The token must belong to this user. As run of different user cannot be deleted.

name

Name of the run which needs to be deleted.

wait_for_completion

When set to True this method will wait for deployment to get deleted.

delete_modelcatalog_item(user, modelcatalog=None, model=None, version=None)[source]

Method to delete an item from modelcatalog Raises exception on any connection errors.

Available in DKube Release: 2.2

Inputs

user

Name of the user.

modelcatalog

Model catalog name

model

Name of the model catalog

version

Version of the model

delete_preprocessing_run(user, name, wait_for_completion=True)[source]

Method to delete a run. Raises exception if token is of different user or if preprocessing run with name doesnt exist or on any connection errors.

Inputs

user

The token must belong to this user. As run of different user cannot be deleted.

name

Name of the run which needs to be deleted.

wait_for_completion

When set to True this method will wait for preprocess run to get deleted.

delete_project(project_id)[source]

Delete project. This only deletes the project and not the associated resources.

Available in DKube Release: 2.2

Inputs

project_id

id of the project

delete_test_inference(user, name, wait_for_completion=True)[source]

Method to delete a test inference. Raises exception if token is of different user or if serving run with name doesnt exist or on any connection errors.

Inputs

user

The token must belong to this user. As run of different user cannot be deleted.

name

Name of the run which needs to be deleted.

wait_for_completion

When set to True this method will wait for inference to get deleted.

delete_training_run(user, name, wait_for_completion=True)[source]

Method to delete a run. Raises exception if token is of different user or if training run with name doesnt exist or on any connection errors.

Inputs

user

The token must belong to this user. As run of different user cannot be deleted.

name

Name of the run which needs to be deleted.

wait_for_completion

When set to True this method will wait for training run to get deleted.

download_dataset(path, user, name, version=None)[source]

This method is to download a version of dataset. Downloaded content will be copied in the specified path.

Inputs

path

Target path where the dataset must be downloaded.

user

name of user who owns the dataset.

name

name of dataset.

version

version of the dataset.

download_model(path, user, name, version=None)[source]

This method is to download a version of model. Downloaded content will be copied in the specified path.

Inputs

path

Target path where the dataset must be downloaded.

user

name of user who owns the dataset.

name

name of dataset.

version

version of the dataset.

get_code(user, name)[source]

Method to fetch the code repo with given name for the given user. Raises exception in case of code is not found or any other connection errors.

Inputs

user

User whose code has to be fetched. In case of if token is of different user, then the token should have permission to fetch the code of the user in the input. They should be in same DKube group.

name

Name of the code repo to be fetched

get_datascience_capabilities()[source]

Method to get the datascience capabilities of the platform. Returns the supported frameworks, versions and the corresponding container image details.

get_dataset(user, name)[source]

Method to fetch the dataset with given name for the given user. Raises exception in case of dataset is not found or any other connection errors.

Inputs

user

User whose dataset has to be fetched. In case of if token is of different user, then the token should have permission to fetch the dataset of the user in the input. They should be in same DKube group.

name

Name of the dataset to be fetched

get_dataset_latest_version(user, name)[source]

Method to get the latest version of the given dataset.

Inputs

name

Name of the dataset

user

owner of the dataset

get_dataset_lineage(user, name, version)[source]

Method to get lineage of a dataset version.

Inputs

name

Name of the dataset

version

Version of the dataset

user

Owner of the dataset.

get_dataset_version(user, name, version)[source]

Method to get details of a version of the given dataset. Raises NotFoundException if the version is not found

Inputs

name

Name of the dataset

version

Version of the dataset

user

owner of the dataset

get_dataset_versions(user, name)[source]

Method to get the versions of dataset. Versions are returned in ascending order.

Inputs

name

Name of the dataset

user

owner of the dataset

get_featureset(featureset=None)[source]

Method to retrieve details of a featureset

Available in DKube Release: 2.2

Inputs

featureset

The name of featureset

Outputs

A dictionary object with response status, featureset metadata and feature versions

get_featurespec(featureset=None)[source]

Method to retrieve feature specification method.

Available in DKube Release: 2.2

Inputs

featureset

The name of featureset

Outputs

A dictionary object with response status and feature specification metadata

get_leaderboard(project_id)[source]

Get project’s leaderboard details.

Available in DKube Release: 2.2

Inputs

project_id

id of the project

get_model(user, name, publish_details=False)[source]

Method to fetch the model with given name for the given user. Raises exception in case of model is not found or any other connection errors.

Inputs

user

User whose model has to be fetched. In case of if token is of different user, then the token should have permission to fetch the model of the user in the input. They should be in same DKube group.

name

Name of the model to be fetched

get_model_latest_version(user, name)[source]

Method to get the latest version of the given model.

Inputs

name

Name of the model

user

owner of the model

get_model_lineage(user, name, version)[source]

Method to get lineage of a model version.

Inputs

name

Name of the model

version

Version of the model

user

Owner of the model.

get_model_version(user, name, version)[source]

Method to get details of a version of the given model. Raises NotFoundException if the version is not found

Inputs

name

Name of the model

version

Version of the model

user

owner of the model

get_model_versions(user, name)[source]

Method to get the versions of model. Versions are returned in ascending order.

Inputs

name

Name of the model

user

owner of the model

get_modelcatalog_item(user, modelcatalog=None, model=None, version=None)[source]

Method to get an item from modelcatalog Raises exception on any connection errors.

Available in DKube Release: 2.2

Inputs

user

Name of the user.

modelcatalog

Model catalog name

model

Name of the model catalog

version

Version of the model

get_notebook_capabilities()[source]

Method to get the notebook capabilities of the platform. Returns the supported frameworks, versions and the image details.

get_preprocessing_run(user, name)[source]

Method to fetch the preprocessing run with given name for the given user. Raises exception in case of run is not found or any other connection errors.

Inputs

user

User whose preprocessing run has to be fetched. In case of if token is of different user, then the token should have permission to fetch the preprocessing run of the user in the input. They should be in same DKube group.

name

Name of the training run to be fetched

get_preprocessing_run_lineage(user, name)[source]

Method to get lineage of a preprocessing run.

Inputs

name

Name of the run

user

owner of the run

get_project(project_id)[source]

Get project details.

Available in DKube Release: 2.2

Inputs

project_id

id of the project

get_project_id(name)[source]

“Get project id from project name.

Available in DKube Release: 2.2

Inputs

name

name of the project

get_r_capabilities()[source]

Method to get the R language capabilities of the platform. Returns the supported frameworks, versions and the image details.

get_serving_capabilities()[source]

Method to get the serving capabilities of the platform. Returns the supported frameworks, versions and the image details.

get_test_inference(user, name)[source]

Method to fetch the test inference with given name for the given user. Raises exception in case of run is not found or any other connection errors.

Inputs

user

User whose test inference has to be fetched. In case of if token is of different user, then the token should have permission to fetch the serving run of the user in the input. They should be in same DKube group.

name

Name of the serving run to be fetched

get_training_capabilities()[source]

Method to get the training capabilities of the platform. Returns the supported frameworks, versions and the image details.

get_training_run(user, name)[source]

Method to fetch the training run with given name for the given user. Raises exception in case of run is not found or any other connection errors.

Inputs

user

User whose training run has to be fetched. In case of if token is of different user, then the token should have permission to fetch the training run of the user in the input. They should be in same DKube group.

name

Name of the training run to be fetched

get_training_run_lineage(user, name)[source]

Method to get lineage of a training run.

Inputs

name

Name of the run

user

owner of the run

launch_jupyter_ide(ide: dkube.sdk.rsrcs.ide.DkubeIDE, wait_for_completion=True)[source]

Method to launch a Jupyter IDE on DKube platform. Two kinds of IDE are supported, Jupyter Notebook & RStudio. Raises Exception in case of errors.

Inputs

ide

Instance of dkube.sdk.rsrcs.DkubeIDE class. Please see the Resources section for details on this class.

wait_for_completion

When set to True this method will wait for job to complete after submission. IDE is declared complete if it is one of the running/failed/error state

launch_rstudio_ide(ide: dkube.sdk.rsrcs.ide.DkubeIDE, wait_for_completion=True)[source]

Method to launch a Rstudio IDE on DKube platform. Two kinds of IDE are supported, Jupyter Notebook & RStudio. Raises Exception in case of errors.

Inputs

ide

Instance of dkube.sdk.rsrcs.DkubeIDE class. Please see the Resources section for details on this class.

wait_for_completion

When set to True this method will wait for job to complete after submission. IDE is declared complete if it is one of the running/failed/error state

list_cicd_images(repo=None)[source]

Method to list all the CICD images + Any images manually added in DKube.

Inputs

repo

Git repo URL. If provided, only returns images generated for that repo

list_code(user, shared=False, filters='*')[source]

Method to list all the code repos of a user. Raises exception on any connection errors.

Inputs

user

User whose projects must be fetched. In case of if token is of different user, then the token should have permission to fetch the projects of the user in the input. They should be in same DKube group.

filters

Only * is supported now.

User will able to filter projects based on state or the source

list_datasets(user, shared=False, filters='*')[source]

Method to list all the datasets of a user. Raises exception on any connection errors.

Inputs

user

User whose datasets must be fetched. In case of if token is of different user, then the token should have permission to fetch the datasets of the user in the input. They should be in same DKube group.

filters

Only * is supported now.

User will able to filter datasets based on state or the source

list_featuresets(query=None)[source]

Method to list featuresets based on query string. Raises Exception in case of errors.

Available in DKube Release: 2.2

Inputs

query

A query string that is compatible with Bleve search format

Outputs

A dictionary object with response status and the list of featuresets

list_ides(user, shared=False, filters='*')[source]

Method to list all the IDEs of a user. Raises exception on any connection errors.

Inputs

user

User whose IDE instances must be fetched. In case of if token is of different user, then the token should have permission to fetch the training runs of the user in the input. They should be in same DKube group.

filters

Only * is supported now.

User will able to filter runs based on state or duration

list_inference_endpoints()[source]

Method to list all the inferences in the dkube cluster. Raises exception on any connection errors.

list_model_deployments(user, shared=False, filters='*')[source]

Method to list all the model deployments. Raises exception on any connection errors.

Inputs

user

Name of the user.

filters

Only * is supported now.

User will able to filter runs based on state or duration

list_models(user, shared=False, published=False, filters='*')[source]

Method to list all the models of a user. Raises exception on any connection errors.

Inputs

user

User whose models must be fetched. In case of if token is of different user, then the token should have permission to fetch the models of the user in the input. They should be in same DKube group.

filters

Only * is supported now.

User will able to filter models based on state or the source

published

If Published is true, it will return all published models

list_preprocessing_runs(user, shared=False, filters='*')[source]

Method to list all the preprocessing runs of a user. Raises exception on any connection errors.

Inputs

user

User whose preprocessing runs must be fetched. In case of if token is of different user, then the token should have permission to fetch the preprocessing runs of the user in the input. They should be in same DKube group.

filters

Only * is supported now.

User will able to filter runs based on state or duration

list_projects()[source]

Return list of DKube projects.

Available in DKube Release: 2.2

list_test_inferences(user, shared=False, filters='*')[source]

Method to list all the training inferences of a user. Raises exception on any connection errors.

Inputs

user

User whose test inferences must be fetched. In case of if token is of different user, then the token should have permission to fetch the serving runs of the user in the input. They should be in same DKube group.

filters

Only * is supported now.

User will able to filter runs based on state or duration

list_training_runs(user, shared=False, filters='*')[source]

Method to list all the training runs of a user. Raises exception on any connection errors.

Inputs

user

User whose training runs must be fetched. In case of if token is of different user, then the token should have permission to fetch the training runs of the user in the input. They should be in same DKube group.

filters

Only * is supported now.

User will able to filter runs based on state or duration

modelcatalog(user)[source]

Method to fetch the model catalog from DKube. Model catalog is list of models published by datascientists and are ready for staging or deployment on a production cluster. The user must have permission to fetch the model catalog.

Available in DKube Release: 2.2

Inputs

user

Name of the user.

publish_model(name, description, details: dkube.sdk.rsrcs.serving.DkubeServing, wait_for_completion=True)[source]

Method to publish a model to model catalog. Raises Exception in case of errors.

Available in DKube Release: 2.2

Inputs

name

Name with which the model must be published in the model catalog.

description

Human readable text for the model being published

details

Instance of dkube.sdk.rsrcs.serving class. Please see the Resources section for details on this class.

If serving image is not updated in run:DkubeServing argument then, - If training used supported standard framework, dkube will pick approp serving image - If training used custom image, dkube will try to use the same image for serving

If transformer image is not updated in run:DkubeServing then, - Dkube will use same image as training image

If transformer code is not updated in run:DkubeServing then, - Dkube will use the code used for training

wait_for_completion

When set to True this method will wait for publish to finish. Publishing is complete if stage of the mode is changed to published/failed/error

read_featureset(**kwargs)[source]

Method to read a featureset version. If name is specified, path is derived. If featureset is not mounted, a copy is made to user’s homedir If path is specified, it should be a mounted path

Available in DKube Release: 2.2

Inputs

name

featureset to be read example: name=’fset’ or None

version

version to be read. If no version specified, latest version is assumed example: version=’v2’ or None

path

path where featureset is mounted. path=’/opt/dkube/fset’ or None

Outputs

Dataframe object

release_model(user, model, version=None, wait_for_completion=True)[source]

Method to release a model to model catalog. Raises Exception in case of errors.

Available in DKube Release: 2.2

Inputs

model

Name with model.

version

Version of the model to be released. If not passed then latest version is released automatically.

user

Owner of the model.

wait_for_completion

When set to True this method will wait for publish to finish. Publishing is complete if stage of the mode is changed to published/failed/error

set_active_project(project_id)[source]

Set active project. Any resources created using this API instance will belong to the given project.

Available in DKube Release: 2.2

Inputs

project_id

ID of the project. pass None to unset.

trigger_runs_bycode(code, user)[source]

Method to trigger all the runs in dkube which uses the mentioned code.

Inputs

code

Name of the code.

user

Owner of the code. All runs of this user will be retriggered.

trigger_runs_bydataset(dataset, user)[source]

Method to trigger all the runs in dkube which uses the mentioned dataset in input.

Inputs

dataset

Name of the dataset.

user

Owner of the dataset. All runs of this user will be retriggered.

trigger_runs_bymodel(model, user)[source]

Method to trigger all the runs in dkube which uses the mentioned model in input.

Inputs

model

Name of the model.

user

Owner of the model. All runs of this user will be retriggered.

update_inference(run: dkube.sdk.rsrcs.serving.DkubeServing, wait_for_completion=True)[source]

Method to update a test inference/deployment in DKube. Raises Exception in case of errors.

Inputs

run

Instance of dkube.sdk.rsrcs.serving class. Please see the Resources section for details on this class.

Picks defaults for predictor, transformer configs from the existing inference deployment. If version is not specified then deployment is updated to latest version.

wait_for_completion

When set to True this method will wait for job to complete after submission. Job is declared complete if it is one of the complete/failed/error state

update_project(project_id, project: dkube.sdk.rsrcs.project.DkubeProject)[source]

Update project details.

Available in DKube Release: 2.2 Note: details and evail_details fields are base64 encoded.

Inputs

project_id

id of the project

project

instance of dkube.sdk.rsrcs.DkubeProject class.

upload_featurespec(featureset=None, filepath=None, metadata=None)[source]

Method to upload feature specification file.

Available in DKube Release: 2.2

Inputs

featureset

The name of featureset

filepath

Filepath for the feature specification metadata yaml file

metadata

feature specification in yaml object.

One of filepath or metadata should be specified.

Outputs

A dictionary object with response status

upload_model(user, name, filepath, extract=False, wait_for_completion=True)[source]

Upload model. This creates a model and uploads the file residing in your local workstation. Supported formats are tar, gz, tar.gz, tgz, zip, csv and txt.

Available in DKube Release: 2.2

Inputs

user

name of user under which model is to be created in dkube.

name

name of model to be created in dkube.

filepath

path of the file to be uploaded

extract

if extract is set to True, the file will be extracted after upload.

wait_for_completion

When set to True this method will wait for model resource to get into one of the complete state. model is declared complete if it is one of the complete/failed/error state

validate_token()[source]

Method which can be used to validate the token. Returns the JWT Claims. Which contains the role assigned to the user.

DKube Resources

class dkube.sdk.rsrcs.project.DkubeProject(name, **kwargs)[source]

This class defines the properties which can be set on the instance of DkubeProject.

Available in DKube Release: 2.2

Properties

name

name of the project

description

description of the project (Optional)

image

URL for the image thumbnail for this project (Optional)

leaderboard

set True to enable the leaderboard (default False)

details

Project details. this should be base64 encoded (Optional)

eval_repo

Dkube code repo name of eval repository

eval_commit_id

commit id of eval repository (Optional)

eval_image

Docker image to be used for evaluation (Optional)

eval_script

command to run for evaluating the submission

eval_details

Evaluation details. This should be base64 encoded (Optional)

class dkube.sdk.rsrcs.dataset.DkubeDataset(user, name='dataset-0422', remote=False, tags=None)[source]

This class defines the DKube dataset with helper functions to set properties of dataset.:

from dkube.sdk import *
mnist = DkubeDataset("oneconv", name="mnist")

Where first argument is the user of this dataset. User should be a valid onboarded user in dkube.
DATASET_SOURCES = ['dvs', 'git', 'aws_s3', 's3', 'gcs', 'nfs', 'redshift', 'k8svolume']

List of valid datasources in DKube. Some datasources are downloaded while some are remotely referenced.

dvs :- To create an empty repository which can be used in future runs.

git :- If data is in the git repo. All git compatible repos are supported - github, bitbucket, gitlab. Downloaded

aws_s3 :- If the data is in AWS s3 bucket. Downloaded | Remote

s3 :- Non aws s3 data source. Like MinIO deployed on internal cluster. Downloaded | Remote

gcs :- Google cloud storage as data source. Downloaded

nfs :- External NFS server as data source. Remote

redshift :- Redshift as data source. Remote

k8svolume :- Kubernetes volume as data source. Remote

hostpath :- If data is in a path in host machine. Remote

GIT_ACCESS_OPTS = ['apikey', 'sshkey', 'password']

List of authentication options supported for git data source.

apikey :- Github APIKey based authentication. This must have permission on the repo to clone and checkout.

sshkey :- Git SSH key based authentication.

password :- Standard username/password based.

update_awss3_details(bucket, prefix, key, secret)[source]

Method to update details of aws s3 data source.

Inputs

bucket

Valid bucket in aws s3

prefix

Path to an object in the bucket. Dkube will fetch recursively all objects under this prefix.

key

AWS s3 access key id

secret

AWS s3 access key secret

update_dataset_source(source='dvs')[source]

Method to update the source for this dataset. It should be one of the choice mentioned in DATASET_SOURCES Default value is git

update_gcs_details(bucket, prefix, key, secret)[source]

Method to update details of google cloud storage.

Inputs

bucket

Valid bucket in GCS

prefix

Path to an object in bucket. Dkube will fetch recursively all objects under this prefix.

key

Name of the GCS secret

secret

Content of the secret

update_git_details(url, branch=None, authopt='apikey', authval=None)[source]

Method to update the details of git datasource.

Inputs

url

A valid Git URL. Following are considered as valid URLs.

branch

Valid branch of git repo. If not provided then master branch is used by default.

authopt

One of the valid option from GIT_ACCESS_OPTS

authval

Value corresponding to the authopt

update_hostpath_details(path)[source]

Method to update details of hostpath.

Inputs

path

Location in the host machine where the data is stored.

update_k8svolume_details(name)[source]

Method to update details of k8s volume data source.

Inputs

name

Name of the kubernetes volume. Volume should not be already Bound.

update_nfs_details(server, path='/')[source]

Method to update details of nfs data source.

Inputs

server

IP address of the nfs server.

path

Path in the nfs export. This path is directly mounted for the user program.

update_puburl_details(url, extract)[source]

Method to update details of pub_url data source.

Inputs

url

pub_url of the data

extract

if set to True, data will be extracted

update_redshift_details(endpoint, database, user=None, password=None)[source]

Method to update details of redshift data source.

Inputs

endpoint

Redshift endpoint

password

Login password. Username is picked up from the login name in DKube.

database

Database in redshift to connect to.

region

AWS region in which the redshift is setup.

update_s3_details(endpoint, bucket, prefix, key, secret)[source]

Method to update details of s3 data source like minio.

Inputs

bucket

Valid bucket name in s3 store

prefix

Path to an object in the bucket. Dkube will fetch recursively all objects under this prefix.

key

S3 access key id

secret

s3 access key secret

class dkube.sdk.rsrcs.code.DkubeCode(user, name='code-1119', tags=None)[source]

This class defines the Dkube code with helper functions to set properties of code repo.:

from dkube.sdk import *
mnist = DkubeCode("oneconv", name="mnist")

Where first argument is the user of this code repo. User should be a valid onboarded user in dkube.
GIT_ACCESS_OPTS = ['apikey', 'sshkey', 'password']

List of authentication options supported for git data source.

apikey :- Github APIKey based authentication. This must have permission on the repo to clone and checkout.

sshkey :- Git SSH key based authentication.

password :- Standard username/password based.

update_git_details(url, branch=None, authopt='apikey', authval=None)[source]

Method to update the details of git datasource.

Inputs

url

A valid Git URL. Following are considered as valid URLs.

branch

Valid branch of git repo. If not provided then master branch is used by default.

authopt

One of the valid option from GIT_ACCESS_OPTS

authval

Value corresponding to the authopt

class dkube.sdk.rsrcs.featureset.DkubeFeatureSet(name='featureset-1009', tags=None, description=None, path=None, config_file='/opt/dkube/conf/conf.json')[source]

This class defines the DKube featureset with helper functions to set properties of featureset.:

from dkube.sdk import *
 mnist = DkubeFeatureSet(name="mnist-fs")

Available in DKube Release: 2.2

classmethod read_features(path)[source]

Method to read features from the specified path

Inputs

path

A valid filepath.

Outputs

df

features DataFrame object

update_featurespec_file(path=None)[source]

Method to update the filepath for feature specification metadata

Inputs

path

A valid filepath. The file should be an YAML file describing a ‘Name’, ‘Description’, ‘Schema’ for each feature.

classmethod write_features(df, path)[source]

Method to write features at the specified path

Inputs

df

features DataFrame object

path

A valid filepath.

class dkube.sdk.rsrcs.model.DkubeModel(user, name='dataset-2943', tags=None)[source]

This class defines the DKube model with helper functions to set properties of model.:

from dkube.sdk import *
mnist = DkubeModel("oneconv", name="mnist")

Where first argument is the owner of this model. User should be a valid onboarded user in dkube.
GIT_ACCESS_OPTS = ['apikey', 'sshkey', 'password']

List of authentication options supported for git data source.

apikey :- Github APIKey based authentication. This must have permission on the repo to clone and checkout.

sshkey :- Git SSH key based authentication.

password :- Standard username/password based.

MODEL_SOURCES = ['dvs', 'git', 'aws_s3', 's3', 'gcs', 'nfs', 'k8svolume', 'workstation']

List of valid model sources in DKube. Some sources are downloaded while some are remotely referenced.

dvs :- To create an empty repository which can be used in future runs.

git :- If data is in the git repo. All git compatible repos are supported - github, bitbucket, gitlab. Downloaded

aws_s3 :- If the data is in AWS s3 bucket. Downloaded | Remote

s3 :- Non aws s3 data source. Like MinIO deployed on internal cluster. Downloaded | Remote

gcs :- Google cloud storage as data source. Downloaded

nfs :- External NFS server as data source. Remote

k8svolume :- Kubernetes volume as data source. Remote

workstation :- To upload data that is present on the local workstation. Uploaded

update_awss3_details(bucket, prefix, key, secret)[source]

Method to update details of aws s3 data source.

Inputs

bucket

Valid bucket in aws s3

prefix

Path to an object in the bucket. Dkube will fetch recursively all objects under this prefix.

key

AWS s3 access key id

secret

AWS s3 access key secret

update_gcs_details(bucket, prefix, key, secret)[source]

Method to update details of google cloud storage.

Inputs

bucket

Valid bucket in GCS

prefix

Path to an object in bucket. Dkube will fetch recursively all objects under this prefix.

key

Name of the GCS secret

secret

Content of the secret

update_git_details(url, branch=None, authopt='apikey', authval=None)[source]

i Method to update the details of git source.

Inputs

url

A valid Git URL. Following are considered as valid URLs.

branch

Valid branch of git repo. If not provided then master branch is used by default.

authopt

One of the valid option from GIT_ACCESS_OPTS

authval

Value corresponding to the authopt

update_k8svolume_details(name)[source]

Method to update details of k8s volume data source.

Inputs

name

Name of the kubernetes volume. Volume should not be already Bound.

update_model_source(source='dvs')[source]

Method to update the source for this model. It should be one of the choice mentioned in MODEL_SOURCES Default value is git

update_nfs_details(server, path='/')[source]

Method to update details of nfs data source.

Inputs

server

IP address of the nfs server.

path

Path in the nfs export. This path is directly mounted for the user program.

update_puburl_details(url, extract)[source]

Method to update details of pub_url model source.

Inputs

url

pub_url of the model

extract

if set to True, model will be extracted

update_s3_details(endpoint, bucket, prefix, key, secret)[source]

Method to update details of s3 data source like minio.

Inputs

bucket

Valid bucket name in s3 store

prefix

Path to an object in the bucket. Dkube will fetch recursively all objects under this prefix.

key

S3 access key id

secret

s3 access key secret

class dkube.sdk.rsrcs.training.DkubeTraining(user, name='train-5284', description='', tags=[])[source]

This class defines DKube Training Run with helper functions to set properties of Training Run.:

from dkube.sdk import *
training = DkubeTraining("oneconv", name="mnist-run")

Where first argument is the user of the Training Run. User should be a valid onboarded user in dkube.
DISTRIBUTION_OPTS = ['manual', 'auto']

Options for GPU jobs configured to run on multiple nodes Default option is ‘auto’ where distribution is configured by the framework

auto :- Framework configures the distribution mechanism

manual :- User configures the distribution mechanism

add_code(name, commitid=None)[source]

Method to update Code Repo for training run

Inputs

name

Name of Code Repo

commitid

commit id to retreive from code repository

add_envvar(key, value)[source]

Method to add env variable for the training run

Inputs

key

Name of env variable

value

Value of env variable

add_envvars(vars={})[source]

Method to add env variables for the training run

Inputs

vars

Dictionary of env variable name and value

add_input_dataset(name, version=None, mountpath=None)[source]

Method to update Dataset Repo input for training run

Inputs

name

Name of Dataset Repo

version

Version (unique id) to use from Dataset

mountpath

Path at which the Dataset contents are made available in the training run pod. For local Dataset, mountpath points to the contents of Dataset. For remote Dataset, mounpath contains the metadata for the Dataset.

add_input_featureset(name, version=None, mountpath=None)[source]

Method to update Featureset input for training run

Inputs

name

Name of Featureset

version

Version (unique id) to use from Featureset

mountpath

Path at which the Featureset contents are made available in the training run pod

add_input_model(name, version=None, mountpath=None)[source]

Method to update Model Repo input for training run

Inputs

name

Name of Model Repo

version

Version (unique id) to use from Model

mountpath

Path at which the Model contents are made available in the training run pod

add_output_model(name, version=None, mountpath=None)[source]

Method to update Model Repo output for training run

Inputs

name

Name of Model Repo

version

Version (unique id) to use from Model (TODO)

mountpath

Path to write model files in the training run. A new version is created in the Model Repo with files written to this path.

disable_execution()[source]

Method to create Run with no execution to track external execution

update_basic(user, name, description, tags)[source]

Method to update the attributes specified at creation. Description and tags can be updated. tags is a list of string values.

update_config_file(name, body=None)[source]

Method to update config file for training run

Inputs

name

Name of config file

body

Config data which is made available as file with the specified name to the training pod under /mnt/dkube/config

update_container(framework='custom', image_url='', login_uname='', login_pswd='')[source]

Method to update the framework and image to use for the training run.

Inputs

framework

One of the frameworks from FRAMEWORK_OPTS

image_url

url for the image repository
e.g, docker.io/ocdr/dkube-datascience-tf-cpu:v2.0.0

login_uname

username to access the image repository

login_pswd

password to access the image repository

update_distribution(opt='manual', nworkers=0)[source]

Method to update gpu distribution method for training run

Inputs

opt

GPU distribution method specified as one of DISTRIBUTION_OPTS

nworkers

Number of required workers

update_group(group='default')[source]

Method to update the group to place the Training Run.

update_hptuning(name, body=None)[source]

Method to update hyperparameter tuning file for training run

Inputs

name

Name of hyperparameter tuning file

body

Hyperparameter tuning data in yaml format which is made available as file with the specified name to the training pod under /mnt/dkube/config

update_resources(cpus=None, mem=None, ngpus=0)[source]

Method to update resource requirements for training run

Inputs

cpus

Number of required cpus

mem

Memory requied in MB (TODO)

gpus

Number of required gpus

update_startupscript(startup_script=None)[source]

Method to update startup command for the training run

Inputs

startup_script

Startup command for the training run pod. Relative path from the root of the code repository should be specified.

class dkube.sdk.rsrcs.preprocessing.DkubePreprocessing(user, name='data-4904', description='', tags=[])[source]

This class defines DKube Preprocessing Run with helper functions to set properties of Preprocessing Run.:

from dkube.sdk import *
preprocessing = DkubePreprocessing("oneconv", name="mnist-run")

Where first argument is the user of the Preprocessing Run. User should be a valid onboarded user in dkube.
add_code(name, commitid='')[source]

Method to update Code Repo for Preprocessing run

Inputs

name

Name of Code Repo

commitid

commit id to retreive from code repository

add_envvar(key, value)[source]

Method to add env variable for the training run Inputs

key

Name of env variable

value

Value of env variable

add_envvars(vars={})[source]

Method to add env variables for the training run Inputs

vars

Dictionary of env variable name and value

add_input_dataset(name, version=None, mountpath=None)[source]

Method to update Dataset Repo input for Preprocessing run

Inputs

name

Name of Dataset Repo

version

Version (unique id) to use from Dataset

mountpath

Path at which the Dataset contents are made available in the Preprocessing run pod. For local Dataset, mountpath points to the contents of Dataset. For remote Dataset, mounpath contains the metadata for the Dataset.

add_input_featureset(name, version=None, mountpath=None)[source]

Method to update Featureset input for Preprocessing run

Inputs

name

Name of Featureset

version

Version (unique id) to use from Featureset

mountpath

Path at which the Featureset contents are made available in the Preprocessing run pod

add_input_model(name, version=None, mountpath=None)[source]

Method to update Model Repo input for Preprocessing run

Inputs

name

Name of Model Repo

version

Version (unique id) to use from Model

mountpath

Path at which the Model contents are made available in the Preprocessing run pod

add_output_dataset(name, version=None, mountpath=None)[source]

Method to update Dataset Repo output for Preprocessing run

Inputs

name

Name of Dataset Repo

version

Version (unique id) to use from Model (TODO)

mountpath

Path to write model files in the Preprocessing run. A new version is created in the Dataset Repo with files written to this path.

add_output_featureset(name, version=None, mountpath=None)[source]

Method to update Featureset output for Preprocessing run

Inputs

name

Name of Featureset

version

Version (unique id) to use from Featureset (TODO)

mountpath

Path to write Featureset files in the Preprocessing run. A new version is created in the Featureset with files written to this path.

update_basic(user, name, description, tags)[source]

Method to update the attributes specified at creation. Description and tags can be updated. tags is a list of string values.

update_config_file(name, body=None)[source]

Method to update config file for training run Inputs

name

Name of config file

body

Config data which is made available as file with the specified name to the training pod under /mnt/dkube/config

update_container(image_url=None, login_uname=None, login_pswd=None)[source]

Method to update the framework and image to use for the Preprocessing run.

Inputs

framework

One of the frameworks from FRAMEWORK_OPTS

image_url

url for the image repository
e.g, docker.io/ocdr/dkube-datascience-tf-cpu:v2.0.0

login_uname

username to access the image repository

login_pswd

password to access the image repository

update_envvars(envs={})[source]

Method to update env variables for the Preprocessing run

Inputs

vars

Dictionary of env variable name and value

update_group(group='default')[source]

Method to update the group to place the Preprocessing Run.

update_startupscript(startup_script=None)[source]

Method to update startup command for the Preprocessing run

Inputs

startup_script

Startup command for the Preprocessing run pod. Relative path from the root of the code repository should be specified.

class dkube.sdk.rsrcs.ide.DkubeIDE(user, name='notebook-2188', description='', tags=[])[source]

This class defines DKube IDE with helper functions to set properties of IDE.:

from dkube.sdk import *
ide = DkubeIDE("oneconv", name="ide")

Where first argument is the user of the IDE. User should be a valid onboarded user in dkube.
add_code(name, commitid=None)[source]

Method to update Code Repo for IDE

Inputs

name

Name of Code Repo

commitid

commit id to retreive from code repository

add_envvar(key, value)[source]

Method to add env variable for the IDE

Inputs

key

Name of env variable

value

Value of env variable

add_input_dataset(name, version=None, mountpath=None)[source]

Method to update Dataset Repo input for IDE

Inputs

name

Name of Dataset Repo

version

Version (unique id) to use from Dataset

mountpath

Path at which the Dataset contents are made available in the IDE pod. For local Dataset, mountpath points to the contents of Dataset. For remote Dataset, mounpath contains the metadata for the Dataset.

add_input_model(name, version=None, mountpath=None)[source]

Method to update Model Repo input for IDE

Inputs

name

Name of Model Repo

version

Version (unique id) to use from Model

mountpath

Path at which the Model contents are made available in the IDE pod

update_basic(user, name, description, tags)[source]

Method to update the attributes specified at creation. Description and tags can be updated. tags is a list of string values.

update_config_file(name, body=None)[source]

Method to update config file for IDE

Inputs

name

Name of config file

body

Config data which is made available as file with the specified name to the IDE under /mnt/dkube/config

update_container(framework='custom', image_url='', login_uname='', login_pswd='')[source]

Method to update the framework and image to use for the IDE.

Inputs

framework

image_url

url for the image repository
e.g, docker.io/ocdr/dkube-datascience-tf-cpu:v2.0.0

login_uname

username to access the image repository

login_pswd

password to access the image repository

update_group(group='default')[source]

Method to update the group to place the IDE.

update_hptuning(name, body=None)[source]

Method to update hyperparameter tuning file for IDE

Inputs

name

Name of hyperparameter tuning file

body

Hyperparameter tuning data in yaml format which is made available as file with the specified name to the IDE pod under /mnt/dkube/config

update_resources(cpus=None, mem=None, ngpus=0)[source]

Method to update resource requirements for IDE

Inputs

cpus

Number of required cpus

mem

Memory requied in MB (TODO)

gpus

Number of required gpus

class dkube.sdk.rsrcs.serving.DkubeServing(user, name='serving-4501', description='', tags=[])[source]

This class defines Model Deployment with helper functions to set properties of Model Deployment.:

from dkube.sdk import *
serving = DkubeServing("oneconv", name="mnist-serving")

Where first argument is the user of the Model Deployment. User should be a valid onboarded user in dkube.
set_production_deploy()[source]

Method to update the mode to use for Model Serving

Inputs

deploy

Flag to specify Serving for Test or Production (TODO)

set_transformer(transformer: bool = False, script=None)[source]

Method to specify if a transformer is required for pre/post processing of Inference requests and the script to run from the Transformer Code Repo.

Inputs

transformer

True or False

script

Script command to run in the transformer pod from Transformer Code Repo

update_autoscaling_config(min_replicas, max_concurrent_requests)[source]

Method to update the autocale config to use for Model Serving

Inputs

min_replicas

Min number of pods to be running for Serving

max_concurrent_requests

Soft target threshold value for number of concurrent requests to trigger scale up of Serving pods

update_basic(user, name, description, tags)[source]

Method to update the attributes specified at creation. Description and tags can be updated. tags is a list of string values.

update_serving_image(deploy=None, image_url='', login_uname=None, login_pswd=None)[source]

Method to update the image to use for Model Serving

Inputs

deploy

Flag to specify Serving for Test or Production (TODO)

image_url

url for the image repository
e.g, docker.io/ocdr/tensorflowserver:2.0.0

login_uname

username to access the image repository

login_pswd

password to access the image repository

update_serving_model(model, owner=None, version=None)[source]

Method to update Model Repo input for Model Serving

Inputs

name

Name of Model Repo containing the model files

owner

Owner of Model Repo containing the model files

version

Version (unique id) to use from Model Repo

update_transformer_code(code=None, commitid=None)[source]

Method to update Code Repo to use for the Transformer.

Inputs

code

Code Repo containing the script for Transformer

commitid

commit id used to retrieve the transformer Code Repo

update_transformer_image(image_url='', login_uname=None, login_pswd=None)[source]

Method to update the image to use for the transformer

Inputs

image_url

url for the image repository
e.g, docker.io/ocdr/dkube-datascience-tf-cpu:v2.0.0

login_uname

username to access the image repository

login_pswd

password to access the image repository

DKube API Swagger Spec

  • Full spec of DKube APIs

  • All the code is under package dkube.sdk.internal.dkube_api

Click the link to view spec DKUBEAPI.

Indices and tables