Model Settings¶
In MLServer, each loaded model can be configured separately. This configuration will include model information (e.g. metadata about the accepted inputs), but also model-specific settings (e.g. number of parallel workers to run inference).
This configuration will usually be provided through a model-settings.json
file which sits next to the model artifacts.
However, it’s also possible to provide this through environment variables
prefixed with MLSERVER_MODEL_
(e.g. MLSERVER_MODEL_IMPLEMENTATION
). Note
that, in the latter case, this environment variables will be shared across all
loaded models (unless they get overriden by a model-settings.json
file).
Additionally, if no model-settings.json
file is found, MLServer will also try
to load a “default” model from these environment variables.
Settings¶
- pydantic settings mlserver.settings.ModelSettings¶
- Config:
env_file: str = .env
env_prefix: str = MLSERVER_MODEL_
underscore_attrs_are_private: bool = True
- Fields:
- field cache_enabled: bool = False¶
Enable caching for a specific model. This parameter can be used to disable cache for a specific model, if the server level caching is enabled. If the server level caching is disabled, this parameter value will have no effect.
- field implementation_: str [Required] (alias 'implementation')¶
Python path to the inference runtime to use to serve this model (e.g.
mlserver_sklearn.SKLearnModel
).
- field inputs: List[MetadataTensor] = []¶
Metadata about the inputs accepted by the model.
- field max_batch_size: int = 0¶
When adaptive batching is enabled, maximum number of requests to group together in a single batch.
- field max_batch_time: float = 0.0¶
When adaptive batching is enabled, maximum amount of time (in seconds) to wait for enough requests to build a full batch.
- field name: str = ''¶
Name of the model.
- field outputs: List[MetadataTensor] = []¶
Metadata about the outputs returned by the model.
- field parallel_workers: int | None = None¶
Use the parallel_workers field the server wide settings instead.
- field parameters: ModelParameters | None = None¶
Extra parameters for each instance of this model.
- field platform: str = ''¶
Framework used to train and serialise the model (e.g. sklearn).
- field versions: List[str] = []¶
Versions of dependencies used to train the model (e.g. sklearn/0.20.1).
- field warm_workers: bool = False¶
Inference workers will now always be warmed up at start time.
- classmethod parse_file(path: str) ModelSettings ¶
- classmethod parse_obj(obj: Any) ModelSettings ¶
- property version: str | None¶
Extra Model Parameters¶
- pydantic settings mlserver.settings.ModelParameters¶
Parameters that apply only to a particular instance of a model. This can include things like model weights, or arbitrary
extra
parameters particular to the underlying inference runtime. The main difference with respect toModelSettings
is that parameters can change on each instance (e.g. each version) of the model.- Config:
env_file: str = .env
env_prefix: str = MLSERVER_MODEL_
extra: Extra = allow
- Fields:
- field content_type: str | None = None¶
Default content type to use for requests and responses.
- field environment_tarball: str | None = None¶
Path to the environment tarball which should be used to load this model.
- field extra: dict | None = {}¶
Arbitrary settings, dependent on the inference runtime implementation.
- field format: str | None = None¶
Format of the model (only available on certain runtimes).
- field uri: str | None = None¶
URI where the model artifacts can be found. This path must be either absolute or relative to where MLServer is running.
- field version: str | None = None¶
Version of the model.