The quality of your overall AI/ML solution depends on the quality of your training data.
Clean, transform and organize your data so it can be used effectively in your AI/ML projects. DKube helps you fill in missing data, remove outliers, and format the data in a manner that is most appropriate for your specific use case.
Apart from industry-standard MLOps features, DKube also comes packed with collaboration features that finally help your Data Science and IT Teams communicate on common ground.
solutions
Feature Engineering
DKube supports a flexible end-to-end workflow that follows the data from its inception to production serving. Work with raw data and process it for optimized training outcomes. The processed datasets are saved in a Feature Store.
The processing can be done manually in an IDE such as JupyterLab or RStudio, through individual preprocessing Runs, or through an automated Kubeflow Pipeline.
Flexible code and data integration is built into the workflow, and DKube supports:
The most popular code repositories, including GitHub, GitLab, and Bitbucket
The most common storage standards for data and models, including GitHub, GitLab, Bitbucket, AWS S3, Minio, Google Cloud Storage, and Redshift.
Feature Sets
The Feature Sets on DKube are automatically organized through versions, and the versions can be conveniently viewed within the DKube UI. Each version saves the complete lineage of how the raw data becomes a Feature Set to allow for better tracking.
The Data Scientist uses the processed data from the Feature Sets for code development, experimentation, and - once the components have been completed - in a training pipeline.
The Feature Sets on DKube are a global resource. Once the optimized processing has been identified for a particular dataset, the same Feature Set can be used by other stakeholders in the organization for their training needs. This ensures that a clean, optimized input is available for efficient training.
The data workflow continues through this phase as well, by enabling the same integrated access to the original data management steps.
solutions
MLOps on HPC/ Slurm
For industries that need to go beyond standard server capabilities, training models on a High-Performance Computing (HPC) platform is highly recommended. Until today, MLOps workflows based on Kubernetes have different applications, frameworks, tools, workflows, and administration than HPC systems, often based on Slurm.
DKube removes this obstacle by allowing you to submit your ML jobs to a Slurm-based HPC system directly, without compromising on its Kubeflow foundation or broad MLOps capabilities.
This unleashes the advantages of both types of platforms and enables use cases that would not otherwise be feasible.
The program code and datasets do not need to be modified. All required translation is handled automatically by DKube, and remote execution supports all of the powerful features of the DKube MLOps platform.
Integration into the end-to-end DKube MLOps workflow
Local Kubernetes storage for MLOps metadata
KFServing for production serving
Access to MLFlow metric collection, display, and compare
Lineage of every run and model for reproducibility, enhancement, governance, and audit
Separate management of Kubernetes and HPC domains
On-demand use of HPC training only when required
Automatic versioning of data and models
Hyperparameter tuning on job granularity
Support for Kubeflow Pipelines with HPC jobs spawned from a pipeline step
Read more about MLOps on HPC/ Slurm with Kubeflow
solutions
MLOps on HPC/ LSF
Until now, the obstacle has been that the MLOps workflow, based on Kubernetes, has different applications, frameworks, tools, workflows, and administration than HPC systems, often based on LSF.
DKube removes this obstacle by allowing you to submit your ML jobs to an LSF-based HPC system directly, and without any compromise on its Kubeflow foundation or its broad MLOps capabilities. This unleashes the advantages of both types of platforms and enables use cases that would not otherwise be feasible.
The program code and datasets do not need to be modified. All required translation is handled automatically, and the remote execution supports all of the powerful features of the DKube MLOps platform. This includes:
Integration into the end-to-end DKube MLOps workflow
Local Kubernetes storage for MLOps metadata
KFServing for production serving
Access to MLFlow metric collection, display, and compare
Lineage of every run and model for reproducibility, enhancement, governance, and audit
Separate management of Kubernetes and HPC domains
On-demand use of HPC training only when required
Automatic versioning of data and models
Hyperparameter tuning on job granularity
Support for Kubeflow Pipelines with HPC jobs spawned from a pipeline step
Hub and Spoke Execution Architecture
DKube uses an innovative hub and spoke architecture to integrate the remote Slurm cluster into the MLOps workflow, and communication happens through simple plug-ins. This has the following advantages:
Loose integration allows the 2 domains (MLOps & Slurm) to use their own tools, disciplines, administration, and workflows
It is non-intrusive to the HPC system
ML workloads can be run on the compute-intensive HPC system on demand
The primary activity happens on the hub, a Kubeflow-based framework that runs Kubernetes containers. This handles
The management of the system
The data sources
Metadata storage
Job management
Automation
Model management
The HPC/Slurm cluster is the spoke in the architecture, and there can be more multiple Slurm clusters in the system. The Slurm cluster
Executes the job using Singularity
Communicates with the DKube hub
Adding a remote HPC/Slurm cluster to the DKube Kubernetes hub is quick and straightforward. The information required to access the cluster, including the credentials, is entered from the DKube UI. This creates a link between the clusters so that they can be viewed as a single MLOps entity.