How to launch hyperparameter tuning

How to launch hyperparameter tuning runs taking advantage of the Katib based hyperparameter tuning available in Kubeflow. Pick the best model from the multiple training runs as the winner based on pre-set criteria.

Written by

Team DKube

more resources

Similar Blogs

View all resources

Blogs

MLOps on HPC/Slurm with Kubeflow

Over the last decade enterprises have made heavy investments in High Performance Computing (HPC) to solve complex scientific problems. They have used Slurm to schedule these massively parallel jobs on large clusters of compute nodes with accelerated hardware. AI/ML uses similar hardware for deep learning model training and enterprises are looking to find solutions that provide AI/ML model development on top of their existing HPC infrastructure. A recent trend in AI/ML is to use agile MLOps methodologies to productionize AI/ML models quickly. Marrying the two - AI/ML development using MLOps with HPC/Slurm clusters - will lead to a much faster adoption of this combination. This article elaborates on how to combine popular open-source frameworks, Slurm and Kubeflow, to run AI/ML workloads at scale on HPC clusters.