Content Types (and Codecs)

Machine learning models generally expect their inputs to be passed down as a particular Python type. Most commonly, this type ranges from “general purpose” NumPy arrays or Pandas DataFrames to more granular definitions, like datetime objects, Pillow images, etc. Unfortunately, the definition of the V2 Inference Protocol doesn’t cover any of the specific use cases. This protocol can be thought of a wider “lower level” spec, which only defines what fields a payload should have.

To account for this gap, MLServer introduces support for content types, which offer a way to let MLServer know how it should “decode” V2-compatible payloads. When shaped in the right way, these payloads should “encode” all the information required to extract the higher level Python type that will be required for a model.

To illustrate the above, we can think of a Scikit-Learn pipeline, which takes in a Pandas DataFrame and returns a NumPy Array. Without the use of content types, the V2 payload itself would probably lack information about how this payload should be treated by MLServer Likewise, the Scikit-Learn pipeline wouldn’t know how to treat a raw V2 payload. In this scenario, the use of content types allows us to specify information on what’s the actual “higher level” information encoded within the V2 protocol payloads.

Content Types

Usage

Note

Some inference runtimes may apply a content type by default if none is present. To learn more about each runtime’s defaults, please check the relevant inference runtime’s docs.

To let MLServer know that a particular payload must be decoded / encoded as a different Python data type (e.g. NumPy Array, Pandas DataFrame, etc.), you can specifity it through the content_type field of the parameters section of your request.

As an example, we can consider the following dataframe, containing two columns: Age and First Name.

First Name

Age

Joanne

34

Michael

22

This table, could be specified in the V2 protocol as the following payload, where we declare that:

  • The whole set of inputs should be decoded as a Pandas Dataframe (i.e. setting the content type as pd).

  • The First Name column should be decoded as a UTF-8 string (i.e. setting the content type as str).

{
  "parameters": {
    "content_type": "pd"
  },
  "inputs": [
    {
      "name": "First Name",
      "datatype": "BYTES",
      "parameters": {
        "content_type": "str"
      },
      "shape": [2],
      "data": ["Joanne", "Michael"]
    },
    {
      "name": "Age",
      "datatype": "INT32",
      "shape": [2],
      "data": [34, 22]
    },
  ]
}

To learn more about the available content types and how to use them, you can see all the available ones in the Available Content Types section below.

Note

It’s important to keep in mind that content types can be specified at both the request level and the input level. The former will apply to the entire set of inputs, whereas the latter will only apply to a particular input of the payload.

Codecs

Under the hood, the conversion between content types is implemented using codecs. In the MLServer architecture, codecs are an abstraction which know how to encode and decode high-level Python types to and from the V2 Inference Protocol.

Depending on the high-level Python type, encoding / decoding operations may require access to multiple input or output heads. For example, a Pandas Dataframe would need to aggregate all of the input-/output-heads present in a V2 Inference Protocol response.

Request Codecs

However, a Numpy array or a list of strings, could be encoded directly as an input head within a larger request.

Input Codecs

To account for this, codecs can work at either the request- / response-level (known as request codecs), or the input- / output-level (known as input codecs). Each of these codecs, expose the following public interface, where Any represents a high-level Python datatype (e.g. a Pandas Dataframe, a Numpy Array, etc.):

  • Request Codecs

    • encode_request(payload: Any) -> InferenceRequest

    • decode_request(request: InferenceRequest) -> Any

    • encode_response(model_name: str, payload: Any, model_version: str) -> InferenceResponse

    • decode_response(response: InferenceResponse) -> Any

  • Input Codecs

    • encode_input(name: str, payload: Any) -> RequestInput

    • decode_input(request_input: RequestInput) -> Any

    • encode_output(name: str, payload: Any) -> ResponseOutput

    • decode_output(response_output: ResponseOutput) -> Any

Note that, these methods can also be used as helpers to encode requests and decode responses on the client side. This can help to abstract away from the user most of the details about the underlying structure of V2-compatible payloads.

For example, in the example above, we could use codecs to encode the DataFrame into a V2-compatible request simply as:

import pandas as pd

from mlserver.codecs import PandasCodec

dataframe = pd.DataFrame({'First Name': ["Joanne", "Michael"], 'Age': [34, 22]})

v2_request = PandasCodec.encode_request(dataframe)
print(v2_request)

For a full end-to-end example on how content types and codecs work under the hood, feel free to check out this Content Type Decoding example.

Model Metadata

Content types can also be defined as part of the model’s metadata. This lets the user pre-configure what content types should a model use by default to decode / encode its requests / responses, without the need to specify it on each request.

For example, to configure the content type values of the example above, one could create a model-settings.json file like the one below:

model-settings.json
{
  "parameters": {
    "content_type": "pd"
  },
  "inputs": [
    {
      "name": "First Name",
      "datatype": "BYTES",
      "parameters": {
        "content_type": "str"
      },
      "shape": [-1],
    },
    {
      "name": "Age",
      "datatype": "INT32",
      "shape": [-1],
    },
  ]
}

It’s important to keep in mind that content types passed explicitly as part of the request will always take precedence over the model’s metadata. Therefore, we can leverage this to override the model’s metadata when needed.

Available Content Types

Out of the box, MLServer supports the following list of content types. However, this can be extended through the use of 3rd-party or custom runtimes.

Python Type

Content Type

Request Level

Request Codec

Input Level

Input Codec

NumPy Array

np

mlserver.codecs.NumpyRequestCodec

mlserver.codecs.NumpyCodec

Pandas DataFrame

pd

mlserver.codecs.PandasCodec

UTF-8 String

str

mlserver.codecs.string.StringRequestCodec

mlserver.codecs.StringCodec

Base64

base64

mlserver.codecs.Base64Codec

Datetime

datetime

mlserver.codecs.DatetimeCodec

Note

MLServer allows you extend the supported content types by adding custom ones. To learn more about how to write your own custom content types, you can check this full end-to-end example. You can also learn more about building custom extensions for MLServer on the Custom Inference Runtime section of the docs.

NumPy Array

Note

The V2 Inference Protocol expects that the data of each input is sent as a flat array. Therefore, the np content type will expect that tensors are sent flattened. The information in the shape field will then be used to reshape the vector into the right dimensions.

The np content type will decode / encode V2 payloads to a NumPy Array, taking into account the following:

  • The datatype field will be matched to the closest NumPy dtype.

  • The shape field will be used to reshape the flattened array expected by the V2 protocol into the expected tensor shape.

Note

By default, MLServer will always assume that an array with a single-dimensional shape, e.g. [N], is equivalent to [N, 1]. That is, each entry will be treated like a single one-dimensional data point (i.e. instead of a [1, D] array, where the full array is a single D-dimensional data point). To avoid any ambiguity, where possible, the Numpy codec will always explicitly encode [N] arrays as [N, 1].

For example, if we think of the following NumPy Array:

import numpy as np

foo = np.array([[1, 2], [3, 4]])

We could encode it as the input foo in a V2 protocol request as:

{
  "inputs": [
    {
      "name": "foo",
      "parameters": {
        "content_type": "np"
      },
      "data": [1, 2, 3, 4]
      "datatype": "INT32",
      "shape": [2, 2],
    }
  ]
}
from mlserver.codecs import NumpyRequestCodec

# Encode an entire V2 request
v2_request = NumpyRequestCodec.encode_request(foo)
from mlserver.types import InferenceRequest
from mlserver.codecs import NumpyCodec

# We can use the `NumpyCodec` to encode a single input head with name `foo`
# within a larger request
v2_request = InferenceRequest(
  inputs=[
    NumpyCodec.encode_input("foo", foo)
  ]
)

When using the NumPy Array content type at the request-level, it will decode the entire request by considering only the first input element. This can be used as a helper for models which only expect a single tensor.

Pandas DataFrame

Note

The pd content type can be stacked with other content types. This allows the user to use a different set of content types to decode each of the columns.

The pd content type will decode / encode a V2 request into a Pandas DataFrame. For this, it will expect that the DataFrame is shaped in a columnar way. That is,

  • Each entry of the inputs list (or outputs, in the case of responses), will represent a column of the DataFrame.

  • Each of these entires, will contain all the row elements for that particular column.

  • The shape field of each input (or output) entry will contain (at least) the amount of rows included in the dataframe.

For example, if we consider the following dataframe:

A

B

C

a1

b1

c1

a2

b2

c2

a3

b3

c3

a4

b4

c4

We could encode it to the V2 Inference Protocol as:

{
  "parameters": {
    "content_type": "pd"
  },
  "inputs": [
    {
      "name": "A",
      "data": ["a1", "a2", "a3", "a4"]
      "datatype": "BYTES",
      "shape": [3],
    },
    {
      "name": "B",
      "data": ["b1", "b2", "b3", "b4"]
      "datatype": "BYTES",
      "shape": [3],
    },
    {
      "name": "C",
      "data": ["c1", "c2", "c3", "c4"]
      "datatype": "BYTES",
      "shape": [3],
    },
  ]
}
import pandas as pd

from mlserver.codecs import PandasCodec

foo = pd.DataFrame({
  "A": ["a1", "a2", "a3", "a4"],
  "B": ["b1", "b2", "b3", "b4"],
  "C": ["c1", "c2", "c3", "c4"]
})

v2_request = PandasCodec.encode_request(foo)

UTF-8 String

The str content type lets you encode / decode a V2 input into a UTF-8 Python string, taking into account the following:

  • The expected datatype is BYTES.

  • The shape field represents the number of “strings” that are encoded in the payload (e.g. the ["hello world", "one more time"] payload will have a shape of 2 elements).

For example, when if we consider the following list of strings:

foo = ["bar", "bar2"]

We could encode it to the V2 Inference Protocol as:

{
  "parameters": {
    "content_type": "str"
  },
  "inputs": [
    {
      "name": "foo",
      "data": ["bar", "bar2"]
      "datatype": "BYTES",
      "shape": [2],
    }
  ]
}
from mlserver.codecs.string import StringRequestCodec

# Encode an entire V2 request
v2_request = StringRequestCodec.encode_request(foo, use_bytes=False)
from mlserver.types import InferenceRequest
from mlserver.codecs import StringCodec

# We can use the `StringCodec` to encode a single input head with name `foo`
# within a larger request
v2_request = InferenceRequest(
  inputs=[
    StringCodec.encode_input("foo", foo, use_bytes=False)
  ]
)

When using the str content type at the request-level, it will decode the entire request by considering only the first input element. This can be used as a helper for models which only expect a single string or a set of strings.

Base64

The base64 content type will decode a binary V2 payload into a Base64-encoded string (and viceversa), taking into account the following:

  • The expected datatype is BYTES.

  • The data field should contain the base64-encoded binary strings.

  • The shape field represents the number of binary strings that are encoded in the payload.

For example, if we think of the following “bytes array”:

foo = b"Python is fun"

We could encode it as the input foo of a V2 request as:

{
  "inputs": [
    {
      "name": "foo",
      "parameters": {
        "content_type": "base64"
      },
      "data": ["UHl0aG9uIGlzIGZ1bg=="]
      "datatype": "BYTES",
      "shape": [1],
    }
  ]
}
from mlserver.types import InferenceRequest
from mlserver.codecs import Base64Codec

# We can use the `Base64Codec` to encode a single input head with name `foo`
# within a larger request
v2_request = InferenceRequest(
  inputs=[
    Base64Codec.encode_input("foo", foo, use_bytes=False)
  ]
)

Datetime

The datetime content type will decode a V2 input into a Python datetime.datetime object, taking into account the following:

  • The expected datatype is BYTES.

  • The data field should contain the dates serialised following the ISO 8601 standard.

  • The shape field represents the number of datetimes that are encoded in the payload.

For example, if we think of the following datetime object:

import datetime

foo = datetime.datetime(2022, 1, 11, 11, 0, 0)

We could encode it as the input foo of a V2 request as:

{
  "inputs": [
    {
      "name": "foo",
      "parameters": {
        "content_type": "datetime"
      },
      "data": ["2022-01-11T11:00:00"]
      "datatype": "BYTES",
      "shape": [1],
    }
  ]
}
from mlserver.types import InferenceRequest
from mlserver.codecs import DatetimeCodec

# We can use the `DatetimeCodec` to encode a single input head with name `foo`
# within a larger request
v2_request = InferenceRequest(
  inputs=[
    DatetimeCodec.encode_input("foo", foo, use_bytes=False)
  ]
)