Deployment with Seldon Core¶

MLServer is used as the core Python inference server in Seldon Core. Therefore, it should be straightforward to deploy your models either by using one of the built-in pre-packaged servers or by pointing to a custom image of MLServer.

Note

This section assumes a basic knowledge of Seldon Core and Kubernetes, as well as access to a working Kubernetes cluster with Seldon Core installed. To learn more about Seldon Core or how to install it, please visit the Seldon Core documentation.

Pre-packaged Servers¶

Out of the box, Seldon Core comes a few MLServer runtimes pre-configured to run straight away. This allows you to deploy a MLServer instance by just pointing to where your model artifact is and specifying what ML framework was used to train it.

Usage¶

To let Seldon Core know what framework was used to train your model, you can use the implementation field of your SeldonDeployment manifest. For example, to deploy a Scikit-Learn artifact stored remotely in GCS, one could do:

apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: my-model
spec:
  protocol: v2
  predictors:
    - name: default
      graph:
        name: classifier
        implementation: SKLEARN_SERVER
        modelUri: gs://seldon-models/sklearn/iris

As you can see highlighted above, all that we need to specify is that:

  • Our inference deployment should use the V2 inference protocol, which is done by setting the protocol field to kfserving.

  • Our model artifact is a serialised Scikit-Learn model, therefore it should be served using the MLServer SKLearn runtime, which is done by setting the implementation field to SKLEARN_SERVER.

Note that, while the protocol should always be set to kfserving (i.e. so that models are served using the V2 inference protocol), the value of the implementation field will be dependant on your ML framework. The valid values of the implementation field are pre-determined by Seldon Core. However, it should also be possible to configure and add new ones (e.g. to support a custom MLServer runtime).

Once you have your SeldonDeployment manifest ready, then the next step is to apply it to your cluster. There are multiple ways to do this, but the simplest is probably to just apply it directly through kubectl, by running:

kubectl apply -f my-seldondeployment-manifest.yaml

To consult the supported values of the implementation field where MLServer is used, you can check the support table below.

Supported Pre-packaged Servers¶

As mentioned above, pre-packaged servers come built-in into Seldon Core. Therefore, only a pre-determined subset of them will be supported for a given release of Seldon Core.

The table below shows a list of the currently supported values of the implementation field. Each row will also show what ML framework they correspond to and also what MLServer runtime will be enabled internally on your model deployment when used.

Framework

MLServer Runtime

Seldon Core Pre-packaged Server

Documentation

Scikit-Learn

MLServer SKLearn

SKLEARN_SERVER

SKLearn Server

XGBoost

MLServer XGBoost

XGBOOST_SERVER

XGBoost Server

MLflow

MLServer MLflow

MLFLOW_SERVER

MLflow Server

Tempo

Tempo

TEMPO_SERVER

Tempo Server

Note that, on top of the ones shown above (backed by MLServer), Seldon Core also provides a wider set of pre-packaged servers. To check the full list, please visit the Seldon Core documentation.

Custom Runtimes¶

There could be cases where the pre-packaged MLServer runtimes supported out-of-the-box in Seldon Core may not be enough for our use case. The framework provided by MLServer makes it easy to write custom runtimes, which can then get packaged up as images. These images then become self-contained model servers with your custom runtime. Therefore Seldon Core makes it as easy to deploy them into your serving infrastructure.

Usage¶

The componentSpecs field of the SeldonDeployment manifest will allow us to let Seldon Core know what image should be used to serve a custom model. For example, if we assume that our custom image has been tagged as my-custom-server:0.1.0, we could write our SeldonDeployment manifest as follows:

apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: my-model
spec:
  protocol: v2
  predictors:
    - name: default
      graph:
        name: classifier
      componentSpecs:
        - spec:
            containers:
              - name: classifier
                image: my-custom-server:0.1.0

As we can see highlighted on the snippet above, all that’s needed to deploy a custom MLServer image is:

  • Letting Seldon Core know that the model deployment will be served through the V2 inference protocol) by setting the protocol field to v2.

  • Pointing our model container to use our custom MLServer image, by specifying it on the image field of the componentSpecs section of the manifest.

Once you have your SeldonDeployment manifest ready, then the next step is to apply it to your cluster. There are multiple ways to do this, but the simplest is probably to just apply it directly through kubectl, by running:

kubectl apply -f my-seldondeployment-manifest.yaml