Content Types (and Codecs)¶
Machine learning models generally expect their inputs to be passed down as a
particular Python type.
Most commonly, this type ranges from “general purpose” NumPy arrays or Pandas
DataFrames to more granular definitions, like datetime
objects, Pillow
images, etc.
Unfortunately, the definition of the V2 Inference
Protocol doesn’t
cover any of the specific use cases.
This protocol can be thought of a wider “lower level” spec, which only
defines what fields a payload should have.
To account for this gap, MLServer introduces support for content types, which offer a way to let MLServer know how it should “decode” V2-compatible payloads. When shaped in the right way, these payloads should “encode” all the information required to extract the higher level Python type that will be required for a model.
To illustrate the above, we can think of a Scikit-Learn pipeline, which takes in a Pandas DataFrame and returns a NumPy Array. Without the use of content types, the V2 payload itself would probably lack information about how this payload should be treated by MLServer Likewise, the Scikit-Learn pipeline wouldn’t know how to treat a raw V2 payload. In this scenario, the use of content types allows us to specify information on what’s the actual “higher level” information encoded within the V2 protocol payloads.
Usage¶
Note
Some inference runtimes may apply a content type by default if none is present. To learn more about each runtime’s defaults, please check the relevant inference runtime’s docs.
To let MLServer know that a particular payload must be decoded / encoded as a
different Python data type (e.g. NumPy Array, Pandas DataFrame, etc.), you can
specifity it through the content_type
field of the parameters
section of
your request.
As an example, we can consider the following dataframe, containing two columns: Age and First Name.
First Name |
Age |
---|---|
Joanne |
34 |
Michael |
22 |
This table, could be specified in the V2 protocol as the following payload, where we declare that:
The whole set of inputs should be decoded as a Pandas Dataframe (i.e. setting the content type as
pd
).The First Name column should be decoded as a UTF-8 string (i.e. setting the content type as
str
).
{
"parameters": {
"content_type": "pd"
},
"inputs": [
{
"name": "First Name",
"datatype": "BYTES",
"parameters": {
"content_type": "str"
},
"shape": [2],
"data": ["Joanne", "Michael"]
},
{
"name": "Age",
"datatype": "INT32",
"shape": [2],
"data": [34, 22]
},
]
}
To learn more about the available content types and how to use them, you can see all the available ones in the Available Content Types section below.
Note
It’s important to keep in mind that content types can be specified at both the request level and the input level. The former will apply to the entire set of inputs, whereas the latter will only apply to a particular input of the payload.
Codecs¶
Under the hood, the conversion between content types is implemented using codecs. In the MLServer architecture, codecs are an abstraction which know how to encode and decode high-level Python types to and from the V2 Inference Protocol.
Depending on the high-level Python type, encoding / decoding operations may require access to multiple input or output heads. For example, a Pandas Dataframe would need to aggregate all of the input-/output-heads present in a V2 Inference Protocol response.
However, a Numpy array or a list of strings, could be encoded directly as an input head within a larger request.
To account for this, codecs can work at either the request- / response-level
(known as request codecs), or the input- / output-level (known as input
codecs).
Each of these codecs, expose the following public interface, where Any
represents a high-level Python datatype (e.g. a Pandas Dataframe, a Numpy
Array, etc.):
Request Codecs
Input Codecs
Note that, these methods can also be used as helpers to encode requests and decode responses on the client side. This can help to abstract away from the user most of the details about the underlying structure of V2-compatible payloads.
For example, in the example above, we could use codecs to encode the DataFrame into a V2-compatible request simply as:
import pandas as pd
from mlserver.codecs import PandasCodec
dataframe = pd.DataFrame({'First Name': ["Joanne", "Michael"], 'Age': [34, 22]})
inference_request = PandasCodec.encode_request(dataframe)
print(inference_request)
For a full end-to-end example on how content types and codecs work under the hood, feel free to check out this Content Type Decoding example.
Converting to / from JSON¶
When using MLServer’s request codecs, the output of encoding payloads will
always be one of the classes within the mlserver.types
package (i.e.
InferenceRequest
or
InferenceResponse
).
Therefore, if you want to use them with requests
(or other package outside of
MLServer) you will need to convert them to a Python dict or a JSON string.
Luckily, these classes leverage Pydantic
under the hood.
Therefore you can just call the .model_dump()
or .model_dump_json()
method to convert them.
Likewise, to read them back from JSON, we can always pass the JSON fields as
kwargs to the class’ constructor (or use any of the other
methods
available within Pydantic).
For example, if we want to send an inference request to model foo
, we could
do something along the following lines:
import pandas as pd
import requests
from mlserver.codecs import PandasCodec
dataframe = pd.DataFrame({'First Name': ["Joanne", "Michael"], 'Age': [34, 22]})
inference_request = PandasCodec.encode_request(dataframe)
# raw_request will be a Python dictionary compatible with `requests`'s `json` kwarg
raw_request = inference_request.dict()
response = requests.post("localhost:8080/v2/models/foo/infer", json=raw_request)
# raw_response will be a dictionary (loaded from the response's JSON),
# therefore we can pass it as the InferenceResponse constructors' kwargs
raw_response = response.json()
inference_response = InferenceResponse(**raw_response)
Support for NaN values¶
The NaN (Not a Number) value is used in Numpy and other scientific libraries to describe an invalid or missing value (e.g. a division by zero). In some scenarios, it may be desirable to let your models receive and / or output NaN values (e.g. these can be useful sometimes with GBTs, like XGBoost models). This is why MLServer supports encoding NaN values on your request / response payloads under some conditions.
In order to send / receive NaN values, you must ensure that:
You are using the
REST
interface.The input / output entry containing NaN values uses either the
FP16
,FP32
orFP64
datatypes.You are either using the Pandas codec or the Numpy codec.
Assuming those conditions are satisfied, any null
value within your tensor
payload will be converted to NaN.
For example, if you take the following Numpy array:
import numpy as np
foo = np.array([[1.2, 2.3], [np.NaN, 4.5]])
We could encode it as:
{
"inputs": [
{
"name": "foo",
"parameters": {
"content_type": "np"
},
"data": [1.2, 2.3, null, 4.5]
"datatype": "FP64",
"shape": [2, 2],
}
]
}
Model Metadata¶
Content types can also be defined as part of the model’s metadata. This lets the user pre-configure what content types should a model use by default to decode / encode its requests / responses, without the need to specify it on each request.
For example, to configure the content type values of the example
above, one could create a model-settings.json
file like the one
below:
{
"parameters": {
"content_type": "pd"
},
"inputs": [
{
"name": "First Name",
"datatype": "BYTES",
"parameters": {
"content_type": "str"
},
"shape": [-1],
},
{
"name": "Age",
"datatype": "INT32",
"shape": [-1],
},
]
}
It’s important to keep in mind that content types passed explicitly as part of the request will always take precedence over the model’s metadata. Therefore, we can leverage this to override the model’s metadata when needed.
Available Content Types¶
Out of the box, MLServer supports the following list of content types. However, this can be extended through the use of 3rd-party or custom runtimes.
Python Type |
Content Type |
Request Level |
Request Codec |
Input Level |
Input Codec |
---|---|---|---|---|---|
|
âś… |
|
âś… |
|
|
|
âś… |
|
❌ |
||
|
âś… |
|
âś… |
|
|
|
❌ |
âś… |
|
||
|
❌ |
âś… |
|
Note
MLServer allows you extend the supported content types by adding custom ones. To learn more about how to write your own custom content types, you can check this full end-to-end example. You can also learn more about building custom extensions for MLServer on the Custom Inference Runtime section of the docs.
NumPy Array¶
Note
The V2 Inference
Protocol expects
that the data
of each input is sent as a flat array.
Therefore, the np
content type will expect that tensors are sent flattened.
The information in the shape
field will then be used to reshape the vector
into the right dimensions.
The np
content type will decode / encode V2 payloads to a NumPy Array, taking
into account the following:
The
datatype
field will be matched to the closest NumPydtype
.The
shape
field will be used to reshape the flattened array expected by the V2 protocol into the expected tensor shape.
Note
By default, MLServer will always assume that an array with a single-dimensional
shape, e.g. [N]
, is equivalent to [N, 1]
.
That is, each entry will be treated like a single one-dimensional data point
(i.e. instead of a [1, D]
array, where the full array is a single
D
-dimensional data point).
To avoid any ambiguity, where possible, the Numpy codec will always
explicitly encode [N]
arrays as [N, 1]
.
For example, if we think of the following NumPy Array:
import numpy as np
foo = np.array([[1, 2], [3, 4]])
We could encode it as the input foo
in a V2 protocol request as:
{
"inputs": [
{
"name": "foo",
"parameters": {
"content_type": "np"
},
"data": [1, 2, 3, 4]
"datatype": "INT32",
"shape": [2, 2],
}
]
}
from mlserver.codecs import NumpyRequestCodec
# Encode an entire V2 request
inference_request = NumpyRequestCodec.encode_request(foo)
from mlserver.types import InferenceRequest
from mlserver.codecs import NumpyCodec
# We can use the `NumpyCodec` to encode a single input head with name `foo`
# within a larger request
inference_request = InferenceRequest(
inputs=[
NumpyCodec.encode_input("foo", foo)
]
)
When using the NumPy Array content type at the request-level, it will decode
the entire request by considering only the first input
element.
This can be used as a helper for models which only expect a single tensor.
Pandas DataFrame¶
Note
The pd
content type can be stacked with other content types.
This allows the user to use a different set of content types to decode each of
the columns.
The pd
content type will decode / encode a V2 request into a Pandas
DataFrame.
For this, it will expect that the DataFrame is shaped in a columnar way.
That is,
Each entry of the
inputs
list (oroutputs
, in the case of responses), will represent a column of the DataFrame.Each of these entires, will contain all the row elements for that particular column.
The
shape
field of eachinput
(oroutput
) entry will contain (at least) the amount of rows included in the dataframe.
For example, if we consider the following dataframe:
A |
B |
C |
---|---|---|
a1 |
b1 |
c1 |
a2 |
b2 |
c2 |
a3 |
b3 |
c3 |
a4 |
b4 |
c4 |
We could encode it to the V2 Inference Protocol as:
{
"parameters": {
"content_type": "pd"
},
"inputs": [
{
"name": "A",
"data": ["a1", "a2", "a3", "a4"]
"datatype": "BYTES",
"shape": [4],
},
{
"name": "B",
"data": ["b1", "b2", "b3", "b4"]
"datatype": "BYTES",
"shape": [4],
},
{
"name": "C",
"data": ["c1", "c2", "c3", "c4"]
"datatype": "BYTES",
"shape": [4],
},
]
}
import pandas as pd
from mlserver.codecs import PandasCodec
foo = pd.DataFrame({
"A": ["a1", "a2", "a3", "a4"],
"B": ["b1", "b2", "b3", "b4"],
"C": ["c1", "c2", "c3", "c4"]
})
inference_request = PandasCodec.encode_request(foo)
UTF-8 String¶
The str
content type lets you encode / decode a V2 input into a UTF-8
Python string, taking into account the following:
The expected
datatype
isBYTES
.The
shape
field represents the number of “strings” that are encoded in the payload (e.g. the["hello world", "one more time"]
payload will have a shape of 2 elements).
For example, when if we consider the following list of strings:
foo = ["bar", "bar2"]
We could encode it to the V2 Inference Protocol as:
{
"parameters": {
"content_type": "str"
},
"inputs": [
{
"name": "foo",
"data": ["bar", "bar2"]
"datatype": "BYTES",
"shape": [2],
}
]
}
from mlserver.codecs.string import StringRequestCodec
# Encode an entire V2 request
inference_request = StringRequestCodec.encode_request(foo, use_bytes=False)
from mlserver.types import InferenceRequest
from mlserver.codecs import StringCodec
# We can use the `StringCodec` to encode a single input head with name `foo`
# within a larger request
inference_request = InferenceRequest(
inputs=[
StringCodec.encode_input("foo", foo, use_bytes=False)
]
)
When using the str
content type at the request-level, it will decode the
entire request by considering only the first input
element.
This can be used as a helper for models which only expect a single string or a
set of strings.
Base64¶
The base64
content type will decode a binary V2 payload into a Base64-encoded
string (and viceversa), taking into account the following:
The expected
datatype
isBYTES
.The
data
field should contain the base64-encoded binary strings.The
shape
field represents the number of binary strings that are encoded in the payload.
For example, if we think of the following “bytes array”:
foo = b"Python is fun"
We could encode it as the input foo
of a V2 request as:
{
"inputs": [
{
"name": "foo",
"parameters": {
"content_type": "base64"
},
"data": ["UHl0aG9uIGlzIGZ1bg=="]
"datatype": "BYTES",
"shape": [1],
}
]
}
from mlserver.types import InferenceRequest
from mlserver.codecs import Base64Codec
# We can use the `Base64Codec` to encode a single input head with name `foo`
# within a larger request
inference_request = InferenceRequest(
inputs=[
Base64Codec.encode_input("foo", foo, use_bytes=False)
]
)
Datetime¶
The datetime
content type will decode a V2 input into a Python
datetime.datetime
object,
taking into account the following:
The expected
datatype
isBYTES
.The
data
field should contain the dates serialised following the ISO 8601 standard.The
shape
field represents the number of datetimes that are encoded in the payload.
For example, if we think of the following datetime
object:
import datetime
foo = datetime.datetime(2022, 1, 11, 11, 0, 0)
We could encode it as the input foo
of a V2 request as:
{
"inputs": [
{
"name": "foo",
"parameters": {
"content_type": "datetime"
},
"data": ["2022-01-11T11:00:00"]
"datatype": "BYTES",
"shape": [1],
}
]
}
from mlserver.types import InferenceRequest
from mlserver.codecs import DatetimeCodec
# We can use the `DatetimeCodec` to encode a single input head with name `foo`
# within a larger request
inference_request = InferenceRequest(
inputs=[
DatetimeCodec.encode_input("foo", foo, use_bytes=False)
]
)