Model Monitoring


Over time, models can degrade, providing inference results that no longer achieve your business goals. DKube integrates model monitoring into the overall workflow. This allows the data science or production teams to monitor the serving results, and take action if the results are no longer within acceptable tolerances.

  • Local and remote deployments can be monitored

  • Monitors can be created using data files rather than running deployments

  • Alerts and status can be set up based on goals and tolerances

  • A Dashboard provides a snapshot of all monitored models

  • Problems can be viewed in a set of hierarchical graphs

  • The problem, and its root cause, can be determined

  • Retraining and redeployment can be performed

Monitor Workflow

The general workflow to make use of the model monitoring system is described in this section.



Create or import a deployment

Deployments Import Deployment

Add a monitor for the deployment

Add a Monitor

Update the schema

Edit Schema

Create alerts for the monitor


Optionally upload a file that sets status thresholds for the monitor


The monitor can be modified after it has been created

Edit an Existing Monitor

The status of the monitors and alerts can be viewed in real-time from the monitor dashboard screen

Monitor Dashboard

Based on the alerts, a specific monitor can be hierarchically investigated to determine what is causing the alert

Monitor Details


It is possible to set up a monitor without a running model, based solely on a set of files. These files can be created manually or automatically from a running model or through a program.

Monitor Menu

The Monitor screens provide a UI-based mechanism to navigate through the workflow.


Menu Item




Create, view, and manage the Code repos



Create, view, and manage the Dataset repos



Create, view, and manage the Model repos



Catalog of images for use in IDEs & Runs



Create, view, and manage deployments and monitors


Monitors are described in this section


View the storage utilization for the user


View the CPU, GPU, memory, and pod utilization for the user

Deployments Dashboard


In order to monitor a model (or set of files), a Deployment must be created or imported. The following deployment approaches are possible:

  • Create a deployment from a trained model Deployments

  • Import a deployment from a remote cluster, described in this guide

  • Create a dummy deployment in order to create a monitor for a set of files

The Deployments Dashboard provides a summary of the currently active deployments.

  • The Status column identifies whether the deployment has been created from within DKube or imported

  • If the deployment has been created within DKube, an endpoint URL is provided

  • If the deployment includes a monitor, the status of the monitor status is provided

The following actions are possible for each deployment:


Deployment Type




Change the model being deployed for that endpoint Change Model Deployment



Change the remote deployment

Add Monitor


Add a monitor to the deployment

Import Deployment

In order to monitor a remote model, or to monitor using a set of files, the deployment must first be imported to the local DKube cluster. Select the image-button button, which brings up the import popup.





Mandatory name of the deployment


Optional fields providing more context for reviewing or filtering


Optional cluster name if the model has been deployed on a remote cluster. Clusters are added as described at Multicluster Management

The Name field rules are as follows:

Import Type


Remote Model

Must must match the deployment name on the remote cluster

Dummy Deployment

User chosen name that does not need to match any deployment

The Fields other than the Name field can be modified through the Edit icon after the deployment has been imported.

Add a Monitor


A monitor can be added by selecting the “Add Monitor” action icon. This will bring up a screen where the basic monitor fields can be filled in. Once the monitor has been added to the deployment, it can be further configured from the monitor dashboard screen.


After the required inputs have been entered and the new monitor submitted, the Schema can be directly accessed. The Schema can also be modified later from the Monitor Dashboard screen.





Model Type

Type of model being monitored, such as regression or classification

Input Data Type

Type of data being monitored, such as tabular or image

Time Zone

Select the time zone to use for the monitor


The health of the deployment can be monitored.





Enable the monitoring of the health of the model instance on the cluster


Select how often the monitor should run

Data Drift

The Drift screen sets up the monitor for data drift.

  • For locally running deployments, or deployments that have been imported from a remote cluster, most of the fields will be filled in based on the deployment metadata

  • For a dummy deployment, where the monitor is based on files and not running deployments, the fields must be filled in to identify what needs to be monitored





Enable data drift monitor


Select how often the monitor should run


Choose the algorithm to use for evaluating data drift

Train Data

Training dataset name and version that should be used for the monitor

Train Data Upload Transformer Script

Optional script if necessary to preprocess or postprocess the data during the inference

Dataset Content

Format of dataset

Predict Data

Prediction dataset name and version that should be used for the monitor

Files Organized As

Folder organization for predict dataset

Predict Data Upload Transformer Script

Optional script if necessary to preprocess or postprocess the data during the inference

The monitor uses the inputs to do a comparison based on the frequency selected, and uses the thresholds or alerts to trigger an event or update the status of the monitor.

Performance Decay

The Performance screen sets up the monitor for metric performance.





Enable metric performance monitor


Select how often the monitor should run

Compute Metrics

Select the format of the files to compute the performance metrics

Labelled Data

The Labelled Data selection expects a dataset file that has columns that provide both the Groundtruth (correct output) and the predicted outputs. Based on this, DKube will calculate the performance for the monitor.




Dataset name and version for the calculation

Dataset Content

The format of the dataset file

Files Organized As

Folder organization for dataset

Upload Transformer Script

Optional script if necessary to preprocess or postprocess the data during the calculation

Groundtruth Column Name

Column header name for the groundtruth

Prediction Column Name

Column header name for the model prediction

Timestamp Column Name

Column header name for the timestamp

Pre-Computed Source

A Pre-Computed Source provides the full computation of the metrics. DKube does not do the computation, but rather uses the information in the file.

An example of a pre-computed file is available at Pre-Computed Source Example


An example of a custom file is available at Custom Metrics Example

Monitor Dashboard


There are 2 different types of notification available for a monitor:




Provides a status indication based on warning and critical thresholds Thresholds


Alerts based on a single threshold set up during the addition of an alert Alerts

Monitor Status


The status of the monitor is defined as follows:




A field is missing


Calculating results after adding datasets


Available for monitor, but not active


Running analysis


Problem with the monitor

Threshold Status


The threshold status of the monitors are provided as a summary at the top of the screen, and for each monitor in the columns below.


The threshold status colors are based on the last run, and are not a cumulative indication of status

The Data Drift and Performance Decay threshold status colors are defined as follows:



Green Dot

The last run was within all of the thresholds

Orange Dot

The last run was between the warning and critical thresholds

Red Dot

The last run was higher than the critical threshold



The actions for each monitor are as follows:




Start or restart the monitor after being stopped


Stop the monitor


Delete the monitor

Edit Monitor

Modify the basic monitor options

Update Schema

Modify the monitor schema

Add Alerts

Add Alerts for the Monitor

Add or Edit Dashboards

Add or modify the dashboard

Upload Thresholds

Upload the thresholds for warning (orange) and critical (red) status


The monitor must be stopped before editing the basic fields or schema


Customizing dashboards are described in the DKube examples repo under the Monitoring branch Custom Dashboard

Edit an Existing Monitor


An existing monitor can be modified by selecting the “Edit Monitor” icon on the right of the monitor summary.


The Monitor must be stopped to edit the monitor

Edit Schema


After the basic information has been completed, the schema needs to be modified to reflect the features. The Monitor Schema can be updated by selecting the “Update Schema” icon on the right of the monitor summary.


The Monitor must be stopped to update the schema

The Schema screen lists the features that are part of the training data. From this screen, you can choose which features to monitor, what type of feature it is (input, prediction, etc), and whether the feature is continuous (a number) or categorical (something is a distinct category such a true or false).



Alerts provide notifications when an input or output of the Model drifts out of tolerance. Alerts can be added by selecting the alert-button icon on the dashboard.

The Alerts screen shows the alerts that have been added for that monitor, and allows the user to create a new alert. The Alert is configured by selecting what type of Alert is monitored (feature drift or performance decay). In each case, an email can be configured to notify an Alert trigger.

_images/Monitor_Alerts_R31.png _images/Monitor_Add_Alert_Popup_R33.png




Enable the alert to be active - the alert can be disabled later by editing it

Alert Name

User-chosen name for the alert

Alert Type

Type of comparison, such as data drift or performance decay

Configure Based On

Create alert based on status or threshold


What feature to compare for this alert, and the threshold to use for the alert

Breach Incidents

Optionally set the number of times the feature matches the threshold before triggering an alert

Email Address

Optionally provide an email address to use when an alert is triggered

The alert will show up on the list of Alerts once successfully created.

Alerts can be edited from the Alert List screen by selecting the Edit icon on the far right.


Thresholds can be set for each feature in a monitor. Rather than trigger a single alert when the threshold is exceeded, the threshold capability allows 2 different thresholds that can provode more granularity. The thresholds are:

  • Warning

  • Critical

If neither of those thresholds are exceeded, the monitor is considered to be “Healthy”. A summary of the threshold status is provided in the Monitor Dashboard and described at Threshold Status

An example of a Thresholds file is available at Thresholds File Example

Tickets Dashboard


A monitor Ticket can be created and managed from the Tickets tab. There are 2 types of tickets:

  • Incidents

  • Change Requests

Selecting one of the ticket types will bring up a ServiceNow window.

Alerts Dashboard


The Alerts Dashboard shows all of the alerts within DKube, across all of the monitors. It provides information on the monitor name, alert name, type of alert, and the timestamps. The user can go directly to the monitor by selecting the monitor name.

Monitor Details


The process of identifying the root cause of a monitor deviation involves successively reviewing more information on an Alert. From the Monitor Dashboard, select one of the monitors to find out more details on that monitor.

From the Monitor Summary dashboard, the details of a specific monitor can be viewed by selecting the monitor name.


This brings up a dashboard for that particular monitor, with the associated details. It includes:

  • A summary of the features and alerts status

  • A list of Alerts for that monitor only, for the selected timeframe

A summary of the Alert can be obtained by selecting the Alert name.


More details on the Alert can be obtained by selecting the “Details” button at the top right.

Data Drift


Selecting the Data Drift tab provides graphs and tables that help to identify what has drifted, with more information to determine why it has drifted.

The top graph overlays the number of production serving requests with the Alerts. This allows the Production Engineer to determine the amount of live inference traffic activity, and how it compares to the threshold alerts for the features.

The table below the summary graph provides visual and quantified information on how the selected features are changing, and how important a feature is to the resulting Model output. This allows the user to see if the original training data still matches the live inference data, and how the drfit varies over time. This might be a place to start for a retraining activity.

The tables reflect the selected timestamp from the graph above. Selecting different timestamps will bring up different tables.

Performance Decay


If Performance is selected, the graphs show how well the Model is performing based on the chosen Model metrics. The top graph combines the number of production requests and the number of alerts.

The bottom graph shows how the metrics are performing.

Configuration, Schema, & Alerts


The Configuration, Scheme, and Alert tabs allows the user to view the options used for the monitor.