Model Settings

In MLServer, each loaded model can be configured separately. This configuration will include model information (e.g. metadata about the accepted inputs), but also model-specific settings (e.g. number of parallel workers to run inference).

This configuration will usually be provided through a model-settings.json file which sits next to the model artifacts. However, it’s also possible to provide this through environment variables prefixed with MLSERVER_MODEL_ (e.g. MLSERVER_MODEL_IMPLEMENTATION). Note that, in the latter case, this environment variables will be shared across all loaded models (unless they get overriden by a model-settings.json file). Additionally, if no model-settings.json file is found, MLServer will also try to load a “default” model from these environment variables.

Settings

pydantic settings mlserver.settings.ModelSettings
Config:
  • extra: str = ignore

  • env_prefix: str = MLSERVER_MODEL_

  • env_file: str = .env

Fields:
field cache_enabled: bool = False

Enable caching for a specific model. This parameter can be used to disable cache for a specific model, if the server level caching is enabled. If the server level caching is disabled, this parameter value will have no effect.

field implementation_: str [Required]
field inputs: List[MetadataTensor] = []

Metadata about the inputs accepted by the model.

field max_batch_size: int = 0

When adaptive batching is enabled, maximum number of requests to group together in a single batch.

field max_batch_time: float = 0.0

When adaptive batching is enabled, maximum amount of time (in seconds) to wait for enough requests to build a full batch.

field name: str = ''

Name of the model.

field outputs: List[MetadataTensor] = []

Metadata about the outputs returned by the model.

field parameters: ModelParameters | None = None

Extra parameters for each instance of this model.

field platform: str = ''

Framework used to train and serialise the model (e.g. sklearn).

field versions: List[str] = []

Versions of dependencies used to train the model (e.g. sklearn/0.20.1).

model_post_init(__context: Any) None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Args:

self: The BaseModel instance. __context: The context.

classmethod model_validate(obj: Any) ModelSettings

Validate a pydantic model instance.

Args:

obj: The object to validate. strict: Whether to enforce types strictly. from_attributes: Whether to extract data from object attributes. context: Additional context to pass to the validator.

Raises:

ValidationError: If the object could not be validated.

Returns:

The validated model instance.

classmethod parse_file(path: str) ModelSettings
property implementation: Type[MLModel]
parallel_workers: int | None

Data descriptor used to emit a runtime deprecation warning before accessing a deprecated field.

Attributes:

msg: The deprecation message to be emitted. wrapped_property: The property instance if the deprecated field is a computed field, or None. field_name: The name of the field being deprecated.

property version: str | None
warm_workers: bool

Data descriptor used to emit a runtime deprecation warning before accessing a deprecated field.

Attributes:

msg: The deprecation message to be emitted. wrapped_property: The property instance if the deprecated field is a computed field, or None. field_name: The name of the field being deprecated.

Extra Model Parameters

pydantic settings mlserver.settings.ModelParameters

Parameters that apply only to a particular instance of a model. This can include things like model weights, or arbitrary extra parameters particular to the underlying inference runtime. The main difference with respect to ModelSettings is that parameters can change on each instance (e.g. each version) of the model.

Config:
  • extra: str = allow

  • env_prefix: str = MLSERVER_MODEL_

  • env_file: str = .env

Fields:
field content_type: str | None = None

Default content type to use for requests and responses.

field environment_tarball: str | None = None

Path to the environment tarball which should be used to load this model.

field extra: dict | None = {}

Arbitrary settings, dependent on the inference runtime implementation.

field format: str | None = None

Format of the model (only available on certain runtimes).

field uri: str | None = None

URI where the model artifacts can be found. This path must be either absolute or relative to where MLServer is running.

field version: str | None = None

Version of the model.