DKube SDK Developer Guide

The document is guide for developers to build ML applications on DKube platform. DKube SDK is repository of abstract python classes and libraries which can be used by client side applications to interface with Dkube platform.

How to install

Python >=python3.5 is required

Install using pip from git using below command:

sudo pip install git+https://github.com/oneconvergence/dkube.git@2.2 or
sudo pip3 install git+https://github.com/oneconvergence/dkube.git@2.2

It will install all the prerequisites in requirements.txt

SDK API

class dkube.sdk.api.DkubeApi(URL=None, token=None, common_tags=[], req_timeout=None, req_retries=None)[source]

This class encapsules all the high level dkube workflow functions.:

from dkube.sdk import *
dapi = DkubeApi()

Inputs

URL

FQDN endpoint at which DKube platform is deployed:

http://dkube-controller-master.dkube.cluster.local:5000

https://dkube.ai:32222

Note

If not provided then the value is picked from DKUBE_ACCESS_URL env variable. If not found then http://dkube-controller-master.dkube.cluster.local:5000 is used assuming the access is internal to the DKube cluster

token

Access token for the APIs, without which DKube will return 40x codes

Note

If not provided then the value is picked from DKUBE_ACCESS_TOKEN env variable. ASSERTs if env is not defined.

common_tags
Tags which need to applied all the resources created using this API object
req_timeout
Timeout for all the requests which are issued using this API object
req_retries
Number of retries per request
commit_featureset(**kwargs)[source]

Method to commit sticky featuresets.

featureset should be in ready state. It will be in created state if no featurespec is uploaded. If the featureset is in created state, the following will happen.

  1. If metadata is passed, it will be uploaded as featurespec
  2. If no metadata is passed, it derives from df and uploads it.

If the featureset is in ready state, the following will happen.

  1. metadata if passed any will be ignored
  2. featurespec will be downloaded for the specifed featureset and df is validated for conformance.

If name is specified, it derives the path for committing the features.

If path is also specified, it doesn’t derive the path. It uses the specified path. However, path should a mount path into dkube store.

If df is not specified, it assumes the df is already written to the featureset path. Features can be written to featureset mount path using DkubeFeatureSet.write_features

Available in DKube Release: 2.2

Inputs

name
featureset name or None example: name=’fset’
df
Dataframe with features to be written None or empty df are invalid type: pandas.DataFrame
metadata
optional yaml object with name, description and schema fields or None example:metadata=[{‘name’:age, ‘description:’’, ‘schema’:int64}]
path
Mount path where featureset is mounted or None example: path=’/opt/dkube/fset’

Outputs

Dictionary with response status
create_code(code: dkube.sdk.rsrcs.code.DkubeCode, wait_for_completion=True)[source]

Method to create a code repo on DKube. Raises Exception in case of errors.

Inputs

code
Instance of dkube.sdk.rsrcs.code class. Please see the Resources section for details on this class.
wait_for_completion
When set to True this method will wait for code resource to get into one of the complete state. code is declared complete if it is one of the complete/failed/error state
create_dataset(dataset: dkube.sdk.rsrcs.dataset.DkubeDataset, wait_for_completion=True)[source]

Method to create a dataset on DKube. Raises Exception in case of errors.

Inputs

dataset
Instance of dkube.sdk.rsrcs.dataset class. Please see the Resources section for details on this class.
wait_for_completion
When set to True this method will wait for dataset resource to get into one of the complete state. dataset is declared complete if it is one of the complete/failed/error state
create_featureset(featureset: dkube.sdk.rsrcs.featureset.DkubeFeatureSet, wait_for_completion=True)[source]

Method to create a featureset on DKube.

Available in DKube Release: 2.2

Inputs

featureset
Instance of dkube.sdk.rsrcs.featureSet class. Please see the Resources section for details on this class.
wait_for_completion
When set to True this method will wait for featureset resource to be ready or created with v1 version in sync state

Outputs

A dictionary object with response status
create_model(model: dkube.sdk.rsrcs.model.DkubeModel, wait_for_completion=True)[source]

Method to create a model on DKube. Raises Exception in case of errors.

Inputs

model
Instance of dkube.sdk.rsrcs.model class. Please see the Resources section for details on this class.
wait_for_completion
When set to True this method will wait for model resource to get into one of the complete state. model is declared complete if it is one of the complete/failed/error state
create_model_deployment(user, name, model, version, description=None, stage_or_deploy='stage', min_replicas=0, max_concurrent_requests=0, wait_for_completion=True)[source]

Method to create a serving deployment for a model in the model catalog. Raises Exception in case of errors.

Inputs

user
Name of the user creating the deployment
name
Name of the deployment. Must be unique
description
User readable description of the deployment
model
Name of the model to be deployed
version

Version of the model to be deployed

stage_or_deploy
Default set to :bash: stage which means to stage the model deployment for testing before deploying it for production. Change to :bash: deploy to deploy the model in production
min_replicas
Minimum number of replicas that each Revision should have. If not prvided, uses value set in platform config map.
max_concurrent_requests
Soft limit that specifies the maximum number of requests an inf pod can process at a time. If not prvided, uses value set in platform config map.
wait_for_completion
When set to True this method will wait for job to complete after submission. Job is declared complete if it is one of the complete/failed/error state
create_preprocessing_run(run: dkube.sdk.rsrcs.preprocessing.DkubePreprocessing, wait_for_completion=True)[source]

Method to create a preprocessing run on DKube. Raises Exception in case of errors.

Inputs

run
Instance of dkube.sdk.rsrcs.Preprocessing class. Please see the Resources section for details on this class.
wait_for_completion
When set to True this method will wait for job to complete after submission. Job is declared complete if it is one of the complete/failed/error state
create_project(project: dkube.sdk.rsrcs.project.DkubeProject)[source]

Creates DKube Project.

Available in DKube Release: 2.2

Inputs

project
instance of dkube.sdk.rsrcs.DkubeProject class.
create_test_inference(run: dkube.sdk.rsrcs.serving.DkubeServing, wait_for_completion=True)[source]

Method to create a test inference on DKube. Raises Exception in case of errors.

Inputs

run

Instance of dkube.sdk.rsrcs.serving class. Please see the Resources section for details on this class.

If serving image is not updated in run:DkubeServing argument then, - If training used supported standard framework, dkube will pick approp serving image - If training used custom image, dkube will try to use the same image for serving

If transformer image is not updated in run:DkubeServing then, - Dkube will use same image as training image

If transformer code is not updated in run:DkubeServing then, - Dkube will use the code used for training

wait_for_completion
When set to True this method will wait for job to complete after submission. Job is declared complete if it is one of the complete/failed/error state
create_training_run(run: dkube.sdk.rsrcs.training.DkubeTraining, wait_for_completion=True)[source]

Method to create a training run on DKube. Raises Exception in case of errors.

Inputs

run
Instance of dkube.sdk.rsrcs.Training class. Please see the Resources section for details on this class.
wait_for_completion
When set to True this method will wait for job to complete after submission. Job is declared complete if it is one of the complete/failed/error state
delete_code(user, name)[source]

Method to delete a code repo. Raises exception if token is of different user or if code with name doesnt exist or on any connection errors.

Inputs

user
The token must belong to this user. As code of different user cannot be deleted.
name
Name of the code which needs to be deleted.
delete_dataset(user, name)[source]

Method to delete a dataset. Raises exception if token is of different user or if dataset with name doesnt exist or on any connection errors.

Inputs

user
The token must belong to this user. As dataset of different user cannot be deleted.
name
Name of the dataset which needs to be deleted.
delete_featureset(name)[source]

Method to delete a a featureset.

Available in DKube Release: 2.2

Inputs

name
featureset name to be deleted. example: “mnist-fs”

Outputs

A dictionary object with response status and the deleted featureset name
delete_featuresets(featureset_list)[source]

Method to delete a list of featuresets on DKube. Raises Exception in case of errors.

Available in DKube Release: 2.2

Inputs

featureset_list
list of featureset names example: [“mnist-fs”, “titanic-fs”]

Outputs

A dictionary object with response status with the list of deleted featureset names
delete_ide(user, name)[source]

Method tio delete an IDE. Raises exception if token is of different user or if training run with name doesnt exist or on any connection errors.

Inputs

user
The token must belong to this user. As IDE instance of different user cannot be deleted.
name
Name of the IDE which needs to be deleted.
delete_model(user, name)[source]

Method to delete a model. Raises exception if token is of different user or if model with name doesnt exist or on any connection errors.

Inputs

user
The token must belong to this user. As model of different user cannot be deleted.
name
Name of the model which needs to be deleted.
delete_model_deployment(user, name)[source]

Method to delete a model deployment. Raises exception if token is of different user or if serving run with name doesnt exist or on any connection errors.

Inputs

user
The token must belong to this user. As run of different user cannot be deleted.
name
Name of the run which needs to be deleted.
delete_preprocessing_run(user, name)[source]

Method to delete a run. Raises exception if token is of different user or if preprocessing run with name doesnt exist or on any connection errors.

Inputs

user
The token must belong to this user. As run of different user cannot be deleted.
name
Name of the run which needs to be deleted.
delete_project(project_id)[source]

Delete project. This only deletes the project and not the associated resources.

Available in DKube Release: 2.2

Inputs

project_id
id of the project
delete_test_inference(user, name)[source]

Method to delete a test inference. Raises exception if token is of different user or if serving run with name doesnt exist or on any connection errors.

Inputs

user
The token must belong to this user. As run of different user cannot be deleted.
name
Name of the run which needs to be deleted.
delete_training_run(user, name)[source]

Method to delete a run. Raises exception if token is of different user or if training run with name doesnt exist or on any connection errors.

Inputs

user
The token must belong to this user. As run of different user cannot be deleted.
name
Name of the run which needs to be deleted.
get_code(user, name)[source]

Method to fetch the code repo with given name for the given user. Raises exception in case of code is not found or any other connection errors.

Inputs

user
User whose code has to be fetched. In case of if token is of different user, then the token should have permission to fetch the code of the user in the input. They should be in same DKube group.
name
Name of the code repo to be fetched
get_datascience_capabilities()[source]

Method to get the datascience capabilities of the platform. Returns the supported frameworks, versions and the corresponding container image details.

get_dataset(user, name)[source]

Method to fetch the dataset with given name for the given user. Raises exception in case of dataset is not found or any other connection errors.

Inputs

user
User whose dataset has to be fetched. In case of if token is of different user, then the token should have permission to fetch the dataset of the user in the input. They should be in same DKube group.
name
Name of the dataset to be fetched
get_dataset_latest_version(user, name)[source]

Method to get the latest version of the given dataset.

Inputs

name
Name of the dataset
user
owner of the dataset
get_dataset_lineage(user, name, version)[source]

Method to get lineage of a dataset version.

Inputs

name
Name of the dataset
version
Version of the dataset
user
Owner of the dataset.
get_dataset_version(user, name, version)[source]

Method to get details of a version of the given dataset. Raises NotFoundException if the version is not found

Inputs

name
Name of the dataset
version
Version of the dataset
user
owner of the dataset
get_dataset_versions(user, name)[source]

Method to get the versions of dataset. Versions are returned in ascending order.

Inputs

name
Name of the dataset
user
owner of the dataset
get_featureset(featureset=None)[source]

Method to retrieve details of a featureset

Available in DKube Release: 2.2

Inputs

featureset

The name of featureset

Outputs

A dictionary object with response status, featureset metadata and feature versions
get_featurespec(featureset=None)[source]

Method to retrieve feature specification method.

Available in DKube Release: 2.2

Inputs

featureset

The name of featureset

Outputs

A dictionary object with response status and feature specification metadata
get_leaderboard(project_id)[source]

Get project’s leaderboard details.

Available in DKube Release: 2.2

Inputs

project_id
id of the project
get_model(user, name)[source]

Method to fetch the model with given name for the given user. Raises exception in case of model is not found or any other connection errors.

Inputs

user
User whose model has to be fetched. In case of if token is of different user, then the token should have permission to fetch the model of the user in the input. They should be in same DKube group.
name
Name of the model to be fetched
get_model_latest_version(user, name)[source]

Method to get the latest version of the given model.

Inputs

name
Name of the model
user
owner of the model
get_model_lineage(user, name, version)[source]

Method to get lineage of a model version.

Inputs

name
Name of the model
version
Version of the model
user
Owner of the model.
get_model_version(user, name, version)[source]

Method to get details of a version of the given model. Raises NotFoundException if the version is not found

Inputs

name
Name of the model
version
Version of the model
user
owner of the model
get_model_versions(user, name)[source]

Method to get the versions of model. Versions are returned in ascending order.

Inputs

name
Name of the model
user
owner of the model
get_modelcatalog_item(user, model, version)[source]

Method to get an item from modelcatalog Raises exception on any connection errors.

Available in DKube Release: 2.2

Inputs

user
Name of the user.
model
Name of the model in the model catalog
version
Version of the model
get_notebook_capabilities()[source]

Method to get the notebook capabilities of the platform. Returns the supported frameworks, versions and the image details.

get_preprocessing_run(user, name)[source]

Method to fetch the preprocessing run with given name for the given user. Raises exception in case of run is not found or any other connection errors.

Inputs

user
User whose preprocessing run has to be fetched. In case of if token is of different user, then the token should have permission to fetch the preprocessing run of the user in the input. They should be in same DKube group.
name
Name of the training run to be fetched
get_preprocessing_run_lineage(user, name)[source]

Method to get lineage of a preprocessing run.

Inputs

name
Name of the run
user
owner of the run
get_project(project_id)[source]

Get project details.

Available in DKube Release: 2.2

Inputs

project_id
id of the project
get_project_id(name)[source]

“Get project id from project name.

Available in DKube Release: 2.2

Inputs

name
name of the project
get_r_capabilities()[source]

Method to get the R language capabilities of the platform. Returns the supported frameworks, versions and the image details.

get_serving_capabilities()[source]

Method to get the serving capabilities of the platform. Returns the supported frameworks, versions and the image details.

get_test_inference(user, name)[source]

Method to fetch the test inference with given name for the given user. Raises exception in case of run is not found or any other connection errors.

Inputs

user
User whose test inference has to be fetched. In case of if token is of different user, then the token should have permission to fetch the serving run of the user in the input. They should be in same DKube group.
name
Name of the serving run to be fetched
get_training_capabilities()[source]

Method to get the training capabilities of the platform. Returns the supported frameworks, versions and the image details.

get_training_run(user, name)[source]

Method to fetch the training run with given name for the given user. Raises exception in case of run is not found or any other connection errors.

Inputs

user
User whose training run has to be fetched. In case of if token is of different user, then the token should have permission to fetch the training run of the user in the input. They should be in same DKube group.
name
Name of the training run to be fetched
get_training_run_lineage(user, name)[source]

Method to get lineage of a training run.

Inputs

name
Name of the run
user
owner of the run
launch_jupyter_ide(ide: dkube.sdk.rsrcs.ide.DkubeIDE, wait_for_completion=True)[source]

Method to launch a Jupyter IDE on DKube platform. Two kinds of IDE are supported, Jupyter Notebook & RStudio. Raises Exception in case of errors.

Inputs

ide
Instance of dkube.sdk.rsrcs.DkubeIDE class. Please see the Resources section for details on this class.
wait_for_completion
When set to True this method will wait for job to complete after submission. IDE is declared complete if it is one of the running/failed/error state
launch_rstudio_ide(ide: dkube.sdk.rsrcs.ide.DkubeIDE, wait_for_completion=True)[source]

Method to launch a Rstudio IDE on DKube platform. Two kinds of IDE are supported, Jupyter Notebook & RStudio. Raises Exception in case of errors.

Inputs

ide
Instance of dkube.sdk.rsrcs.DkubeIDE class. Please see the Resources section for details on this class.
wait_for_completion
When set to True this method will wait for job to complete after submission. IDE is declared complete if it is one of the running/failed/error state
list_code(user, shared=False, filters='*')[source]

Method to list all the code repos of a user. Raises exception on any connection errors.

Inputs

user
User whose projects must be fetched. In case of if token is of different user, then the token should have permission to fetch the projects of the user in the input. They should be in same DKube group.
filters

Only * is supported now.

User will able to filter projects based on state or the source

list_datasets(user, shared=False, filters='*')[source]

Method to list all the datasets of a user. Raises exception on any connection errors.

Inputs

user
User whose datasets must be fetched. In case of if token is of different user, then the token should have permission to fetch the datasets of the user in the input. They should be in same DKube group.
filters

Only * is supported now.

User will able to filter datasets based on state or the source

list_featuresets(query=None)[source]

Method to list featuresets based on query string. Raises Exception in case of errors.

Available in DKube Release: 2.2

Inputs

query
A query string that is compatible with Bleve search format

Outputs

A dictionary object with response status and the list of featuresets
list_ides(user, shared=False, filters='*')[source]

Method to list all the IDEs of a user. Raises exception on any connection errors.

Inputs

user
User whose IDE instances must be fetched. In case of if token is of different user, then the token should have permission to fetch the training runs of the user in the input. They should be in same DKube group.
filters

Only * is supported now.

User will able to filter runs based on state or duration

list_model_deployments(user, shared=False, filters='*')[source]

Method to list all the model deployments. Raises exception on any connection errors.

Inputs

user
Name of the user.
filters

Only * is supported now.

User will able to filter runs based on state or duration

list_models(user, shared=False, filters='*')[source]

Method to list all the models of a user. Raises exception on any connection errors.

Inputs

user
User whose models must be fetched. In case of if token is of different user, then the token should have permission to fetch the models of the user in the input. They should be in same DKube group.
filters

Only * is supported now.

User will able to filter models based on state or the source

list_preprocessing_runs(user, shared=False, filters='*')[source]

Method to list all the preprocessing runs of a user. Raises exception on any connection errors.

Inputs

user
User whose preprocessing runs must be fetched. In case of if token is of different user, then the token should have permission to fetch the preprocessing runs of the user in the input. They should be in same DKube group.
filters

Only * is supported now.

User will able to filter runs based on state or duration

list_projects()[source]

Return list of DKube projects.

Available in DKube Release: 2.2

list_test_inferences(user, shared=False, filters='*')[source]

Method to list all the training inferences of a user. Raises exception on any connection errors.

Inputs

user
User whose test inferences must be fetched. In case of if token is of different user, then the token should have permission to fetch the serving runs of the user in the input. They should be in same DKube group.
filters

Only * is supported now.

User will able to filter runs based on state or duration

list_training_runs(user, shared=False, filters='*')[source]

Method to list all the training runs of a user. Raises exception on any connection errors.

Inputs

user
User whose training runs must be fetched. In case of if token is of different user, then the token should have permission to fetch the training runs of the user in the input. They should be in same DKube group.
filters

Only * is supported now.

User will able to filter runs based on state or duration

modelcatalog(user)[source]

Method to fetch the model catalog from DKube. Model catalog is list of models published by datascientists and are ready for staging or deployment on a production cluster. The user must have permission to fetch the model catalog.

Available in DKube Release: 2.2

Inputs

user
Name of the user.
publish_model(name, description, details: dkube.sdk.rsrcs.serving.DkubeServing, wait_for_completion=True)[source]

Method to publish a model to model catalog. Raises Exception in case of errors.

Available in DKube Release: 2.2

Inputs

name
Name with which the model must be published in the model catalog.
description
Human readable text for the model being published
details

Instance of dkube.sdk.rsrcs.serving class. Please see the Resources section for details on this class.

If serving image is not updated in run:DkubeServing argument then, - If training used supported standard framework, dkube will pick approp serving image - If training used custom image, dkube will try to use the same image for serving

If transformer image is not updated in run:DkubeServing then, - Dkube will use same image as training image

If transformer code is not updated in run:DkubeServing then, - Dkube will use the code used for training

wait_for_completion
When set to True this method will wait for publish to finish. Publishing is complete if stage of the mode is changed to published/failed/error
read_featureset(**kwargs)[source]

Method to read a featureset version. If name is specified, path is derived. If featureset is not mounted, a copy is made to user’s homedir If path is specified, it should be a mounted path

Available in DKube Release: 2.2

Inputs

name
featureset to be read example: name=’fset’ or None
version
version to be read. If no version specified, latest version is assumed example: version=’v2’ or None
path
path where featureset is mounted. path=’/opt/dkube/fset’ or None

Outputs

Dataframe object
release_model(user, model, version=None, wait_for_completion=True)[source]

Method to release a model to model catalog. Raises Exception in case of errors.

Available in DKube Release: 2.2

Inputs

model
Name with model.
version
Version of the model to be released. If not passed then latest version is released automatically.
user
Owner of the model.
wait_for_completion
When set to True this method will wait for publish to finish. Publishing is complete if stage of the mode is changed to published/failed/error
set_active_project(project_id)[source]

Set active project. Any resources created using this API instance will belong to the given project.

Available in DKube Release: 2.2

Inputs

project_id
ID of the project. pass None to unset.
trigger_runs_bycode(code, user)[source]

Method to trigger all the runs in dkube which uses the mentioned code.

Inputs

code
Name of the code.
user
Owner of the code. All runs of this user will be retriggered.
trigger_runs_bydataset(dataset, user)[source]

Method to trigger all the runs in dkube which uses the mentioned dataset in input.

Inputs

dataset
Name of the dataset.
user
Owner of the dataset. All runs of this user will be retriggered.
trigger_runs_bymodel(model, user)[source]

Method to trigger all the runs in dkube which uses the mentioned model in input.

Inputs

model
Name of the model.
user
Owner of the model. All runs of this user will be retriggered.
update_project(project_id, project: dkube.sdk.rsrcs.project.DkubeProject)[source]

Update project details.

Available in DKube Release: 2.2 Note: details and evail_details fields are base64 encoded.

Inputs

project_id
id of the project
project
instance of dkube.sdk.rsrcs.DkubeProject class.
upload_featurespec(featureset=None, filepath=None, metadata=None)[source]

Method to upload feature specification file.

Available in DKube Release: 2.2

Inputs

featureset
The name of featureset
filepath
Filepath for the feature specification metadata yaml file
metadata
feature specification in yaml object.

One of filepath or metadata should be specified.

Outputs

A dictionary object with response status
upload_model(user, name, filepath, extract=False, wait_for_completion=True)[source]

Upload model. This creates a model and uploads the file residing in your local workstation. Supported formats are tar, gz, tar.gz, tgz, zip, csv and txt.

Available in DKube Release: 2.2

Inputs

user
name of user under which model is to be created in dkube.
name
name of model to be created in dkube.
filepath
path of the file to be uploaded
extract
if extract is set to True, the file will be extracted after upload.
wait_for_completion
When set to True this method will wait for model resource to get into one of the complete state. model is declared complete if it is one of the complete/failed/error state
validate_token()[source]

Method which can be used to validate the token. Returns the JWT Claims. Which contains the role assigned to the user.

DKube Resources

class dkube.sdk.rsrcs.project.DkubeProject(name, **kwargs)[source]

This class defines the properties which can be set on the instance of DkubeProject.

Available in DKube Release: 2.2

Properties

name
name of the project
description
description of the project (Optional)
image
URL for the image thumbnail for this project (Optional)
leaderboard
set True to enable the leaderboard (default False)
details
Project details. this should be base64 encoded (Optional)
eval_repo
Dkube code repo name of eval repository
eval_commit_id
commit id of eval repository (Optional)
eval_image
Docker image to be used for evaluation (Optional)
eval_script
command to run for evaluating the submission
eval_details
Evaluation details. This should be base64 encoded (Optional)
class dkube.sdk.rsrcs.dataset.DkubeDataset(user, name='dataset-5946', tags=None)[source]

This class defines the DKube dataset with helper functions to set properties of dataset.:

from dkube.sdk import *
mnist = DkubeDataset("oneconv", name="mnist")

Where first argument is the user of this dataset. User should be a valid onboarded user in dkube.
DATASET_SOURCES = ['dvs', 'git', 'aws_s3', 's3', 'gcs', 'nfs', 'redshift', 'k8svolume']

List of valid datasources in DKube. Some datasources are downloaded while some are remotely referenced.

dvs :- To create an empty repository which can be used in future runs.

git :- If data is in the git repo. All git compatible repos are supported - github, bitbucket, gitlab. Downloaded

aws_s3 :- If the data is in AWS s3 bucket. Downloaded | Remote

s3 :- Non aws s3 data source. Like MinIO deployed on internal cluster. Downloaded | Remote

gcs :- Google cloud storage as data source. Downloaded

nfs :- External NFS server as data source. Remote

redshift :- Redshift as data source. Remote

k8svolume :- Kubernetes volume as data source. Remote

hostpath :- If data is in a path in host machine. Remote

GIT_ACCESS_OPTS = ['apikey', 'sshkey', 'password']

List of authentication options supported for git data source.

apikey :- Github APIKey based authentication. This must have permission on the repo to clone and checkout.

sshkey :- Git SSH key based authentication.

password :- Standard username/password based.

update_awss3_details(bucket, prefix, key, secret)[source]

Method to update details of aws s3 data source.

Inputs

bucket
Valid bucket in aws s3
prefix
Path to an object in the bucket. Dkube will fetch recursively all objects under this prefix.
key
AWS s3 access key id
secret
AWS s3 access key secret
update_dataset_source(source='dvs')[source]

Method to update the source for this dataset. It should be one of the choice mentioned in DATASET_SOURCES Default value is git

update_gcs_details(bucket, prefix, key, secret)[source]

Method to update details of google cloud storage.

Inputs

bucket
Valid bucket in GCS
prefix
Path to an object in bucket. Dkube will fetch recursively all objects under this prefix.
key
Name of the GCS secret
secret
Content of the secret
update_git_details(url, branch=None, authopt='apikey', authval=None)[source]

Method to update the details of git datasource.

Inputs

url

A valid Git URL. Following are considered as valid URLs.

branch
Valid branch of git repo. If not provided then master branch is used by default.
authopt
One of the valid option from GIT_ACCESS_OPTS
authval
Value corresponding to the authopt
update_hostpath_details(path)[source]

Method to update details of hostpath.

Inputs

path
Location in the host machine where the data is stored.
update_k8svolume_details(name)[source]

Method to update details of k8s volume data source.

Inputs

name
Name of the kubernetes volume. Volume should not be already Bound.
update_nfs_details(server, path='/')[source]

Method to update details of nfs data source.

Inputs

server
IP address of the nfs server.
path
Path in the nfs export. This path is directly mounted for the user program.
update_redshift_details(endpoint, password, database, region)[source]

Method to update details of redshift data source.

Inputs

endpoint
Redshift endpoint
password
Login password. Username is picked up from the login name in DKube.
database
Database in redshift to connect to.
region
AWS region in which the redshift is setup.
update_s3_details(endpoint, bucket, prefix, key, secret)[source]

Method to update details of s3 data source like minio.

Inputs

bucket
Valid bucket name in s3 store
prefix
Path to an object in the bucket. Dkube will fetch recursively all objects under this prefix.
key
S3 access key id
secret
s3 access key secret
class dkube.sdk.rsrcs.code.DkubeCode(user, name='code-0634', tags=None)[source]

This class defines the Dkube code with helper functions to set properties of code repo.:

from dkube.sdk import *
mnist = DkubeCode("oneconv", name="mnist")

Where first argument is the user of this code repo. User should be a valid onboarded user in dkube.
GIT_ACCESS_OPTS = ['apikey', 'sshkey', 'password']

List of authentication options supported for git data source.

apikey :- Github APIKey based authentication. This must have permission on the repo to clone and checkout.

sshkey :- Git SSH key based authentication.

password :- Standard username/password based.

update_git_details(url, branch=None, authopt='apikey', authval=None)[source]

Method to update the details of git datasource.

Inputs

url

A valid Git URL. Following are considered as valid URLs.

branch
Valid branch of git repo. If not provided then master branch is used by default.
authopt
One of the valid option from GIT_ACCESS_OPTS
authval
Value corresponding to the authopt
class dkube.sdk.rsrcs.featureset.DkubeFeatureSet(name='featureset-4939', tags=None, description=None, path=None, config_file='/opt/dkube/conf/conf.json')[source]

This class defines the DKube featureset with helper functions to set properties of featureset.:

from dkube.sdk import *
 mnist = DkubeFeatureSet(name="mnist-fs")

Available in DKube Release: 2.2

classmethod read_features(path)[source]

Method to read features from the specified path

Inputs

path
A valid filepath.

Outputs

df
features DataFrame object
update_featurespec_file(path=None)[source]

Method to update the filepath for feature specification metadata

Inputs

path
A valid filepath. The file should be an YAML file describing a ‘Name’, ‘Description’, ‘Schema’ for each feature.
classmethod write_features(df, path)[source]

Method to write features at the specified path

Inputs

df
features DataFrame object
path
A valid filepath.
class dkube.sdk.rsrcs.model.DkubeModel(user, name='dataset-6766', tags=None)[source]

This class defines the DKube model with helper functions to set properties of model.:

from dkube.sdk import *
mnist = DkubeModel("oneconv", name="mnist")

Where first argument is the owner of this model. User should be a valid onboarded user in dkube.
GIT_ACCESS_OPTS = ['apikey', 'sshkey', 'password']

List of authentication options supported for git data source.

apikey :- Github APIKey based authentication. This must have permission on the repo to clone and checkout.

sshkey :- Git SSH key based authentication.

password :- Standard username/password based.

MODEL_SOURCES = ['dvs', 'git', 'aws_s3', 's3', 'gcs', 'nfs', 'k8svolume', 'workstation']

List of valid model sources in DKube. Some sources are downloaded while some are remotely referenced.

dvs :- To create an empty repository which can be used in future runs.

git :- If data is in the git repo. All git compatible repos are supported - github, bitbucket, gitlab. Downloaded

aws_s3 :- If the data is in AWS s3 bucket. Downloaded | Remote

s3 :- Non aws s3 data source. Like MinIO deployed on internal cluster. Downloaded | Remote

gcs :- Google cloud storage as data source. Downloaded

nfs :- External NFS server as data source. Remote

k8svolume :- Kubernetes volume as data source. Remote

workstation :- To upload data that is present on the local workstation. Uploaded

update_awss3_details(bucket, prefix, key, secret)[source]

Method to update details of aws s3 data source.

Inputs

bucket
Valid bucket in aws s3
prefix
Path to an object in the bucket. Dkube will fetch recursively all objects under this prefix.
key
AWS s3 access key id
secret
AWS s3 access key secret
update_gcs_details(bucket, prefix, key, secret)[source]

Method to update details of google cloud storage.

Inputs

bucket
Valid bucket in GCS
prefix
Path to an object in bucket. Dkube will fetch recursively all objects under this prefix.
key
Name of the GCS secret
secret
Content of the secret
update_git_details(url, branch=None, authopt='apikey', authval=None)[source]

i Method to update the details of git source.

Inputs

url

A valid Git URL. Following are considered as valid URLs.

branch
Valid branch of git repo. If not provided then master branch is used by default.
authopt
One of the valid option from GIT_ACCESS_OPTS
authval
Value corresponding to the authopt
update_k8svolume_details(name)[source]

Method to update details of k8s volume data source.

Inputs

name
Name of the kubernetes volume. Volume should not be already Bound.
update_model_source(source='dvs')[source]

Method to update the source for this model. It should be one of the choice mentioned in MODEL_SOURCES Default value is git

update_nfs_details(server, path='/')[source]

Method to update details of nfs data source.

Inputs

server
IP address of the nfs server.
path
Path in the nfs export. This path is directly mounted for the user program.
update_s3_details(endpoint, bucket, prefix, key, secret)[source]

Method to update details of s3 data source like minio.

Inputs

bucket
Valid bucket name in s3 store
prefix
Path to an object in the bucket. Dkube will fetch recursively all objects under this prefix.
key
S3 access key id
secret
s3 access key secret
class dkube.sdk.rsrcs.training.DkubeTraining(user, name='train-3081', description='', tags=[])[source]

This class defines DKube Training Run with helper functions to set properties of Training Run.:

from dkube.sdk import *
training = DkubeTraining("oneconv", name="mnist-run")

Where first argument is the user of the Training Run. User should be a valid onboarded user in dkube.
DISTRIBUTION_OPTS = ['manual', 'auto']

Options for GPU jobs configured to run on multiple nodes Default option is ‘auto’ where distribution is configured by the framework

auto :- Framework configures the distribution mechanism

manual :- User configures the distribution mechanism

FRAMEWORK_OPTS = ['custom', 'tensorflow_1.14', 'tensorflow_2.0.0', 'tensorflow_2.3.0', 'tensorflow_r-1.14', 'tensorflow_r-2.0.0', 'pytorch_1.6', 'sklearn_0.23.2']

List of valid frameworks for the training images Framework is used to derive the image used for Model Serving

custom :- Custom framework

tensorflow_1.14 :- TF v1.14

tensorflow_2.0.0 :- TF v2.0.0

tensorflow_2.3.0 :- TF v2.3.0

tensorflow_r-1.14 :- TF v1.14 with R

tensorflow_r-2.0.0 :- TF v2.0.0 with R

pytorch_1.6 :- Pytroch v1.6

sklearn_0.23.2 :- Scikit-learn v0.23.2

add_code(name, commitid=None)[source]

Method to update Code Repo for training run

Inputs

name
Name of Code Repo
commitid
commit id to retreive from code repository
add_envvar(key, value)[source]

Method to add env variable for the training run

Inputs

key
Name of env variable
value
Value of env variable
add_envvars(vars={})[source]

Method to add env variables for the training run

Inputs

vars
Dictionary of env variable name and value
add_input_dataset(name, version=None, mountpath=None)[source]

Method to update Dataset Repo input for training run

Inputs

name
Name of Dataset Repo
version
Version (unique id) to use from Dataset
mountpath
Path at which the Dataset contents are made available in the training run pod. For local Dataset, mountpath points to the contents of Dataset. For remote Dataset, mounpath contains the metadata for the Dataset.
add_input_featureset(name, version=None, mountpath=None)[source]

Method to update Featureset input for training run

Inputs

name
Name of Featureset
version
Version (unique id) to use from Featureset
mountpath
Path at which the Featureset contents are made available in the training run pod
add_input_model(name, version=None, mountpath=None)[source]

Method to update Model Repo input for training run

Inputs

name
Name of Model Repo
version
Version (unique id) to use from Model
mountpath
Path at which the Model contents are made available in the training run pod
add_output_model(name, version=None, mountpath=None)[source]

Method to update Model Repo output for training run

Inputs

name
Name of Model Repo
version
Version (unique id) to use from Model (TODO)
mountpath
Path to write model files in the training run. A new version is created in the Model Repo with files written to this path.
disable_execution()[source]

Method to create Run with no execution to track external execution

list_frameworks()[source]

Method to list frameworks available for training run

update_basic(user, name, description, tags)[source]

Method to update the attributes specified at creation. Description and tags can be updated. tags is a list of string values.

update_config_file(name, body=None)[source]

Method to update config file for training run

Inputs

name
Name of config file
body
Config data which is made available as file with the specified name to the training pod under /mnt/dkube/config
update_container(framework='custom', image_url='', login_uname='', login_pswd='')[source]

Method to update the framework and image to use for the training run.

Inputs

framework
One of the frameworks from FRAMEWORK_OPTS
image_url
url for the image repository
e.g, docker.io/ocdr/dkube-datascience-tf-cpu:v2.0.0
login_uname
username to access the image repository
login_pswd
password to access the image repository
update_distribution(opt='manual', nworkers=0)[source]

Method to update gpu distribution method for training run

Inputs

opt
GPU distribution method specified as one of DISTRIBUTION_OPTS
nworkers
Number of required workers
update_group(group='default')[source]

Method to update the group to place the Training Run.

update_hptuning(name, body=None)[source]

Method to update hyperparameter tuning file for training run

Inputs

name
Name of hyperparameter tuning file
body
Hyperparameter tuning data in yaml format which is made available as file with the specified name to the training pod under /mnt/dkube/config
update_resources(cpus=None, mem=None, ngpus=0)[source]

Method to update resource requirements for training run

Inputs

cpus
Number of required cpus
mem
Memory requied in MB (TODO)
gpus
Number of required gpus
update_startupscript(startup_script=None)[source]

Method to update startup command for the training run

Inputs

startup_script
Startup command for the training run pod. Relative path from the root of the code repository should be specified.
class dkube.sdk.rsrcs.preprocessing.DkubePreprocessing(user, name='data-2127', description='', tags=[])[source]

This class defines DKube Preprocessing Run with helper functions to set properties of Preprocessing Run.:

from dkube.sdk import *
preprocessing = DkubePreprocessing("oneconv", name="mnist-run")

Where first argument is the user of the Preprocessing Run. User should be a valid onboarded user in dkube.
add_code(name, commitid='')[source]

Method to update Code Repo for Preprocessing run

Inputs

name
Name of Code Repo
commitid
commit id to retreive from code repository
add_input_dataset(name, version=None, mountpath=None)[source]

Method to update Dataset Repo input for Preprocessing run

Inputs

name
Name of Dataset Repo
version
Version (unique id) to use from Dataset
mountpath
Path at which the Dataset contents are made available in the Preprocessing run pod. For local Dataset, mountpath points to the contents of Dataset. For remote Dataset, mounpath contains the metadata for the Dataset.
add_input_featureset(name, version=None, mountpath=None)[source]

Method to update Featureset input for Preprocessing run

Inputs

name
Name of Featureset
version
Version (unique id) to use from Featureset
mountpath
Path at which the Featureset contents are made available in the Preprocessing run pod
add_input_model(name, version=None, mountpath=None)[source]

Method to update Model Repo input for Preprocessing run

Inputs

name
Name of Model Repo
version
Version (unique id) to use from Model
mountpath
Path at which the Model contents are made available in the Preprocessing run pod
add_output_dataset(name, version=None, mountpath=None)[source]

Method to update Dataset Repo output for Preprocessing run

Inputs

name
Name of Dataset Repo
version
Version (unique id) to use from Model (TODO)
mountpath
Path to write model files in the Preprocessing run. A new version is created in the Dataset Repo with files written to this path.
add_output_featureset(name, version=None, mountpath=None)[source]

Method to update Featureset output for Preprocessing run

Inputs

name
Name of Featureset
version
Version (unique id) to use from Featureset (TODO)
mountpath
Path to write Featureset files in the Preprocessing run. A new version is created in the Featureset with files written to this path.
update_basic(user, name, description, tags)[source]

Method to update the attributes specified at creation. Description and tags can be updated. tags is a list of string values.

update_container(image_url=None, login_uname=None, login_pswd=None)[source]

Method to update the framework and image to use for the Preprocessing run.

Inputs

framework
One of the frameworks from FRAMEWORK_OPTS
image_url
url for the image repository
e.g, docker.io/ocdr/dkube-datascience-tf-cpu:v2.0.0
login_uname
username to access the image repository
login_pswd
password to access the image repository
update_envvars(envs={})[source]

Method to update env variables for the Preprocessing run

Inputs

vars
Dictionary of env variable name and value
update_group(group='default')[source]

Method to update the group to place the Preprocessing Run.

update_startupscript(startup_script=None)[source]

Method to update startup command for the Preprocessing run

Inputs

startup_script
Startup command for the Preprocessing run pod. Relative path from the root of the code repository should be specified.
class dkube.sdk.rsrcs.ide.DkubeIDE(user, name='notebook-1266', description='', tags=[])[source]

This class defines DKube IDE with helper functions to set properties of IDE.:

from dkube.sdk import *
ide = DkubeIDE("oneconv", name="ide")

Where first argument is the user of the IDE. User should be a valid onboarded user in dkube.
FRAMEWORK_OPTS = ['custom', 'tensorflow_1.14', 'tensorflow_2.0.0', 'tensorflow_2.3.0', 'tensorflow_r-1.14', 'tensorflow_r-2.0.0', 'pytorch_1.6', 'sklearn_0.23.2']

List of valid frameworks for the IDE images Framework is used to derive the image used for Model Serving

custom :- Custom framework

tensorflow_1.14 :- TF v1.14

tensorflow_2.0.0 :- TF v2.0.0

tensorflow_2.3.0 :- TF v2.3.0

tensorflow_r-1.14 :- TF v1.14 with R

tensorflow_r-2.0.0 :- TF v2.0.0 with R

pytorch_1.6 :- Pytroch v1.6

sklearn_0.23.2 :- Scikit-learn v0.23.2

add_code(name, commitid=None)[source]

Method to update Code Repo for IDE

Inputs

name
Name of Code Repo
commitid
commit id to retreive from code repository
add_envvar(key, value)[source]

Method to add env variable for the IDE

Inputs

key
Name of env variable
value
Value of env variable
add_input_dataset(name, version=None, mountpath=None)[source]

Method to update Dataset Repo input for IDE

Inputs

name
Name of Dataset Repo
version
Version (unique id) to use from Dataset
mountpath
Path at which the Dataset contents are made available in the IDE pod. For local Dataset, mountpath points to the contents of Dataset. For remote Dataset, mounpath contains the metadata for the Dataset.
add_input_model(name, version=None, mountpath=None)[source]

Method to update Model Repo input for IDE

Inputs

name
Name of Model Repo
version
Version (unique id) to use from Model
mountpath
Path at which the Model contents are made available in the IDE pod
list_frameworks()[source]

Method to list frameworks available for IDE

update_basic(user, name, description, tags)[source]

Method to update the attributes specified at creation. Description and tags can be updated. tags is a list of string values.

update_config_file(name, body=None)[source]

Method to update config file for IDE

Inputs

name
Name of config file
body
Config data which is made available as file with the specified name to the IDE under /mnt/dkube/config
update_container(framework='custom', image_url='', login_uname='', login_pswd='')[source]

Method to update the framework and image to use for the IDE.

Inputs

framework
One of the frameworks from FRAMEWORK_OPTS
image_url
url for the image repository
e.g, docker.io/ocdr/dkube-datascience-tf-cpu:v2.0.0
login_uname
username to access the image repository
login_pswd
password to access the image repository
update_group(group='default')[source]

Method to update the group to place the IDE.

update_hptuning(name, body=None)[source]

Method to update hyperparameter tuning file for IDE

Inputs

name
Name of hyperparameter tuning file
body
Hyperparameter tuning data in yaml format which is made available as file with the specified name to the IDE pod under /mnt/dkube/config
update_resources(cpus=None, mem=None, ngpus=0)[source]

Method to update resource requirements for IDE

Inputs

cpus
Number of required cpus
mem
Memory requied in MB (TODO)
gpus
Number of required gpus
class dkube.sdk.rsrcs.serving.DkubeServing(user, name='serving-8802', description='', tags=[])[source]

This class defines Model Deployment with helper functions to set properties of Model Deployment.:

from dkube.sdk import *
serving = DkubeServing("oneconv", name="mnist-serving")

Where first argument is the user of the Model Deployment. User should be a valid onboarded user in dkube.
set_production_deploy()[source]

Method to update the mode to use for Model Serving

Inputs

deploy
Flag to specify Serving for Test or Production (TODO)
set_transformer(transformer: bool = False, script=None)[source]

Method to specify if a transformer is required for pre/post processing of Inference requests and the script to run from the Transformer Code Repo.

Inputs

transformer
True or False
script
Script command to run in the transformer pod from Transformer Code Repo
update_autoscaling_config(min_replicas, max_concurrent_requests)[source]

Method to update the autocale config to use for Model Serving

Inputs

min_replicas
Min number of pods to be running for Serving
max_concurrent_requests
Soft target threshold value for number of concurrent requests to trigger scale up of Serving pods
update_basic(user, name, description, tags)[source]

Method to update the attributes specified at creation. Description and tags can be updated. tags is a list of string values.

update_serving_image(deploy=None, image_url='', login_uname=None, login_pswd=None)[source]

Method to update the image to use for Model Serving

Inputs

deploy
Flag to specify Serving for Test or Production (TODO)
image_url
url for the image repository
e.g, docker.io/ocdr/tensorflowserver:2.0.0
login_uname
username to access the image repository
login_pswd
password to access the image repository
update_serving_model(model, version=None)[source]

Method to update Model Repo input for Model Serving

Inputs

name
Name of Model Repo containing the model files
version
Version (unique id) to use from Model Repo
update_transformer_code(code=None, commitid=None)[source]

Method to update Code Repo to use for the Transformer.

Inputs

code
Code Repo containing the script for Transformer
commitid
commit id used to retrieve the transformer Code Repo
update_transformer_image(image_url='', login_uname=None, login_pswd=None)[source]

Method to update the image to use for the transformer

Inputs

image_url
url for the image repository
e.g, docker.io/ocdr/dkube-datascience-tf-cpu:v2.0.0
login_uname
username to access the image repository
login_pswd
password to access the image repository

DKube API Swagger Spec

  • Full spec of DKube APIs
  • All the code is under package dkube.sdk.internal.dkube_api

Click the link to view spec DKUBEAPI.

Indices and tables