Model Settings

In MLServer, each loaded model can be configured separately. This configuration will include model information (e.g. metadata about the accepted inputs), but also model-specific settings (e.g. number of parallel workers to run inference).

This configuration will usually be provided through a model-settings.json file which sits next to the model artifacts. However, it’s also possible to provide this through environment variables prefixed with MLSERVER_MODEL_ (e.g. MLSERVER_MODEL_IMPLEMENTATION). Note that, in the latter case, this environment variables will be shared across all loaded models (unless they get overriden by a model-settings.json file). Additionally, if no model-settings.json file is found, MLServer will also try to load a “default” model from these environment variables.


pydantic settings mlserver.settings.ModelSettings
  • env_file: str = .env

  • env_prefix: str = MLSERVER_MODEL_

  • underscore_attrs_are_private: bool = True

field cache_enabled: bool = False

Enable caching for a specific model. This parameter can be used to disable cache for a specific model, if the server level caching is enabled. If the server level caching is disabled, this parameter value will have no effect.

field implementation_: str [Required] (alias 'implementation')

Python path to the inference runtime to use to serve this model (e.g. mlserver_sklearn.SKLearnModel).

field inputs: List[MetadataTensor] = []

Metadata about the inputs accepted by the model.

field max_batch_size: int = 0

When adaptive batching is enabled, maximum number of requests to group together in a single batch.

field max_batch_time: float = 0.0

When adaptive batching is enabled, maximum amount of time (in seconds) to wait for enough requests to build a full batch.

field name: str = ''

Name of the model.

field outputs: List[MetadataTensor] = []

Metadata about the outputs returned by the model.

field parallel_workers: int | None = None

Use the parallel_workers field the server wide settings instead.

field parameters: ModelParameters | None = None

Extra parameters for each instance of this model.

field platform: str = ''

Framework used to train and serialise the model (e.g. sklearn).

field versions: List[str] = []

Versions of dependencies used to train the model (e.g. sklearn/0.20.1).

field warm_workers: bool = False

Inference workers will now always be warmed up at start time.

classmethod parse_file(path: str) ModelSettings
classmethod parse_obj(obj: Any) ModelSettings
property implementation: Type[MLModel]
property version: str | None

Extra Model Parameters

pydantic settings mlserver.settings.ModelParameters

Parameters that apply only to a particular instance of a model. This can include things like model weights, or arbitrary extra parameters particular to the underlying inference runtime. The main difference with respect to ModelSettings is that parameters can change on each instance (e.g. each version) of the model.

  • env_file: str = .env

  • env_prefix: str = MLSERVER_MODEL_

  • extra: Extra = allow

field content_type: str | None = None

Default content type to use for requests and responses.

field environment_tarball: str | None = None

Path to the environment tarball which should be used to load this model.

field extra: dict | None = {}

Arbitrary settings, dependent on the inference runtime implementation.

field format: str | None = None

Format of the model (only available on certain runtimes).

field uri: str | None = None

URI where the model artifacts can be found. This path must be either absolute or relative to where MLServer is running.

field version: str | None = None

Version of the model.