Serving Scikit-Learn models

Out of the box, mlserver supports the deployment and serving of scikit-learn models. By default, it will assume that these models have been serialised using joblib.

In this example, we will cover how we can train and serialise a simple model, to then serve it using mlserver.


The first step will be to train a simple scikit-learn model. For that, we will use the MNIST example from the scikit-learn documentation which trains an SVM model.

# Original source code and more details can be found in:

# Import datasets, classifiers and performance metrics
from sklearn import datasets, svm, metrics
from sklearn.model_selection import train_test_split

# The digits dataset
digits = datasets.load_digits()

# To apply a classifier on this data, we need to flatten the image, to
# turn the data in a (samples, feature) matrix:
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))

# Create a classifier: a support vector classifier
classifier = svm.SVC(gamma=0.001)

# Split data into train and test subsets
X_train, X_test, y_train, y_test = train_test_split(
    data,, test_size=0.5, shuffle=False)

# We learn the digits on the first half of the digits, y_train)

Saving our trained model

To save our trained model, we will serialise it using joblib. While this is not a perfect approach, it’s currently the recommended method to persist models to disk in the scikit-learn documentation.

Our model will be persisted as a file named mnist-svm.joblib

import joblib

model_file_name = "mnist-svm.joblib"
joblib.dump(classifier, model_file_name)


Now that we have trained and saved our model, the next step will be to serve it using mlserver. For that, we will need to create 2 configuration files:

  • settings.json: holds the configuration of our server (e.g. ports, log level, etc.).

  • model-settings.json: holds the configuration of our model (e.g. input type, runtime to use, etc.).


%%writefile settings.json
    "debug": "true"


%%writefile model-settings.json
    "name": "mnist-svm",
    "implementation": "mlserver_sklearn.SKLearnModel",
    "parameters": {
        "uri": "./mnist-svm.joblib",
        "version": "v0.1.0"

Start serving our model

Now that we have our config in-place, we can start the server by running mlserver start .. This needs to either be ran from the same directory where our config files are or pointing to the folder where they are.

mlserver start .

Since this command will start the server and block the terminal, waiting for requests, this will need to be ran in the background on a separate terminal.

Send test inference request

We now have our model being served by mlserver. To make sure that everything is working as expected, let’s send a request from our test set.

For that, we can use the Python types that mlserver provides out of box, or we can build our request manually.

import requests

x_0 = X_test[0:1]
inference_request = {
    "inputs": [
          "name": "predict",
          "shape": x_0.shape,
          "datatype": "FP32",
          "data": x_0.tolist()

endpoint = "http://localhost:8080/v2/models/mnist-svm/versions/v0.1.0/infer"
response =, json=inference_request)


As we can see above, the model predicted the input as the number 8, which matches what’s on the test set.