Model Settings

In MLServer, each loaded model can be configured separately. This configuration will include model information (e.g. metadata about the accepted inputs), but also model-specific settings (e.g. number of parallel workers <../user-guide/parallel-inference> to run inference).

This configuration will usually be provided through a model-settings.json file which sits next to the model artifacts. However, it’s also possible to provide this through environment variables prefixed with MLSERVER_MODEL_ (e.g. MLSERVER_MODEL_IMPLEMENTATION). Note that, in the latter case, this environment variables will be shared across all loaded models (unless they get overriden by a model-settings.json file). Additionally, if no model-settings.json file is found, MLServer will also try to load a “default” model from these environment variables.


pydantic settings mlserver.settings.ModelSettings
  • env_prefix: str = MLSERVER_MODEL_

field implementation: pydantic.types.PyObject = 'mlserver.model.MLModel'

Extra parameters for each instance of this model.

field inputs: List[mlserver.types.dataplane.MetadataTensor] = []

Metadata about the outputs returned by the model.

field max_batch_size: int = 0

When adaptive batching is enabled, maximum amount of time (in seconds) to wait for enough requests to build a full batch.

field max_batch_time: float = 0.0

Python path to the inference runtime to use to serve this model (e.g. mlserver_sklearn.SKLearnModel).

field name: str = ''

Framework used to train and serialise the model (e.g. sklearn).

field outputs: List[mlserver.types.dataplane.MetadataTensor] = []

When parallel inference is enabled, number of workers to run inference across.

field parallel_workers: int = 4

When adaptive batching is enabled, maximum number of requests to group together in a single batch.

field parameters: Optional[mlserver.settings.ModelParameters] = None
field platform: str = ''

Versions of dependencies used to train the model (e.g. sklearn/0.20.1).

field versions: List[str] = []

Metadata about the inputs accepted by the model.

Extra Model Parameters

pydantic settings mlserver.settings.ModelParameters

Parameters that apply only to a particular instance of a model. This can include things like model weights, or arbitrary extra parameters particular to the underlying inference runtime. The main difference with respect to ModelSettings is that parameters can change on each instance (e.g. each version) of the model.

  • env_prefix: str = MLSERVER_MODEL_

field content_type: Optional[str] = None

Arbitrary settings, dependent on the inference runtime implementation.

field extra: Optional[dict] = {}
field format: Optional[str] = None

Default content type to use for requests and responses.

field uri: Optional[str] = None

Version of the model.

field version: Optional[str] = None

Format of the model (only available on certain runtimes).