After the model has been identified that best matches your target goals, it can be published to the Model Catalog to identify it as a candidate for production. The Production Engineer will review the models in the catalog, and deploy the appropriate version using the standard Kubeflow serving framework: KFServing.

The serving image can optionally include preprocessing code that gets executed before the inference, and postprocessing code that can manage the output prior to sending the results to a client.

The served image can be monitored to ensure efficient execution through a dashboard.

Once a model has been deployed at an endpoint, the served model can be changed at the selected endpoint. This allows an easy migration from one version of a model to a different one.