Custom Inference Runtimes¶
There may be cases where the inference runtimes offered out-of-the-box by MLServer may not be enough, or where you may need extra custom functionality which is not included in MLServer (e.g. custom codecs). To cover these cases, MLServer lets you create custom runtimes very easily.
This page covers some of the bigger points that need to be taken into account when extending MLServer. You can also see this end-to-end example which walks through the process of writing a custom runtime.
Writing a custom inference runtime¶
MLServer is designed as an easy-to-extend framework, encouraging users to write
their own custom runtimes easily.
The starting point for this is the
mlserver.MLModel abstract class, whose
main methods are:
load(self) -> bool: Responsible for loading any artifacts related to a model (e.g. model weights, pickle files, etc.).
predict(self, payload: InferenceRequest) -> InferenceResponse: Responsible for using a model to perform inference on an incoming data point.
Therefore, the “one-line version” of how to write a custom runtime is to
write a custom class extending from
mlserver.MLModel, and then overriding
those methods with your custom logic.
from mlserver import MLModel from mlserver.types import InferenceRequest, InferenceResponse class MyCustomRuntime(MLModel): def load(self) -> bool: # TODO: Replace for custom logic to load a model artifact self._model = load_my_custom_model() self.ready = True return self.ready def predict(self, payload: InferenceRequest) -> InferenceResponse: # TODO: Replace for custom logic to run inference return self._model.predict(payload)
Building a custom MLServer image¶
mlserver build command expects that a Docker runtime is available and
running in the background.
MLServer offers built-in utilities to help you build a custom MLServer image.
This image can contain any custom code (including custom inference runtimes),
as well as any custom environment, provided either through a Conda environment
To leverage these, we can use the
mlserver build command.
Assuming that we’re currently on the folder containing our custom inference
runtime, we should be able to just run:
mlserver build . -t my-custom-server
The output will be a Docker image named
my-custom-server, ready to be used.
mlserver build subcommand will search for any Conda
environment file (i.e. named either as
/ or any
requirements.txt present in your root folder.
These can be used to tell MLServer what Python environment is required in the
final Docker image.
mlserver build subcommand will also treat any
model-settings.json files present on your root
folder as the default settings that must be set in your final image.
Therefore, these files can be used to configure things like the default
inference runtime to be used, or
to even include embedded models that will always be present within your custom image.
In some occasions, it may not be possible to access an environment with a running Docker daemon. This can be the case, for example, on some CI pipelines.
To account for these use cases, MLServer also includes a
subcommand which will just generate a
Dockerfile (and optionally a
Dockerfile can be then by used by other “Docker-less” tools, like
Buildah to build the final image.