MLServer Settings¶

MLServer can be configured through a settings.json file on the root folder from where MLServer is started. Note that these are server-wide settings (e.g. gRPC or HTTP port) which are separate from the invidual model settings. Alternatively, this configuration can also be passed through environment variables prefixed with MLSERVER_ (e.g. MLSERVER_GRPC_PORT).

Settings¶

pydantic settings mlserver.settings.Settings¶
Config:
  • env_file: str = .env

  • env_prefix: str = MLSERVER_

Fields:
field cache_enabled: bool = False¶

Enable caching for the model predictions.

field cache_size: int = 100¶

Cache size to be used if caching is enabled.

field cors_settings: CORSSettings | None = None¶
field debug: bool = True¶
field environments_dir: str = '/home/docs/checkouts/readthedocs.org/user_builds/mlserver/checkouts/latest/docs/.envs'¶

Directory used to store custom environments. By default, the .envs folder of the current working directory will be used.

field extensions: List[str] = []¶

Server extensions loaded.

field grpc_max_message_length: int | None = None¶

Maximum length (i.e. size) of gRPC payloads.

field grpc_port: int = 8081¶

Port where to listen for gRPC connections.

field host: str = '0.0.0.0'¶

Host where to listen for connections.

field http_port: int = 8080¶

Port where to listen for HTTP / REST connections.

field kafka_enabled: bool = False¶
field kafka_servers: str = 'localhost:9092'¶
field kafka_topic_input: str = 'mlserver-input'¶
field kafka_topic_output: str = 'mlserver-output'¶
field load_models_at_startup: bool = True¶

Flag to load all available models automatically at startup.

field logging_settings: str | Dict | None = None¶

Path to logging config file or dictionary configuration.

field metrics_dir: str = '/home/docs/checkouts/readthedocs.org/user_builds/mlserver/checkouts/latest/docs/.metrics'¶

Directory used to share metrics across parallel workers. Equivalent to the PROMETHEUS_MULTIPROC_DIR env var in prometheus-client. Note that this won’t be used if the parallel_workers flag is disabled. By default, the .metrics folder of the current working directory will be used.

field metrics_endpoint: str | None = '/metrics'¶

Endpoint used to expose Prometheus metrics. Alternatively, can be set to None to disable it.

field metrics_port: int = 8082¶

Port used to expose metrics endpoint.

field metrics_rest_server_prefix: str = 'rest_server'¶

Metrics rest server string prefix to be exported.

field model_repository_implementation: PyObject | None = None¶

Python path to the inference runtime to model repository (e.g. mlserver.repository.repository.SchemalessModelRepository).

field model_repository_implementation_args: dict = {}¶

Extra parameters for model repository.

field model_repository_root: str = '.'¶

Root of the model repository, where we will search for models.

field parallel_workers: int = 1¶

When parallel inference is enabled, number of workers to run inference across.

field parallel_workers_timeout: int = 5¶

Grace timeout to wait until the workers shut down when stopping MLServer.

field root_path: str = ''¶

Set the ASGI root_path for applications submounted below a given URL path.

field server_name: str = 'mlserver'¶

Name of the server.

field server_version: str = '1.5.0.dev1'¶

Version of the server.

field tracing_server: str | None = None¶

Server name used to export OpenTelemetry tracing to collector service.

field use_structured_logging: bool = False¶

Use JSON-formatted structured logging instead of default format.