MLServer Settings

MLServer can be configured through a settings.json file on the root folder from where MLServer is started. Note that these are server-wide settings (e.g. gRPC or HTTP port) which are separate from the invidual model settings. Alternatively, this configuration can also be passed through environment variables prefixed with MLSERVER_ (e.g. MLSERVER_GRPC_PORT).

Settings

pydantic settings mlserver.settings.Settings
Config
  • env_file: str = .env

  • env_prefix: str = MLSERVER_

Fields
field cors_settings: Optional[mlserver.settings.CORSSettings] = None
field debug: bool = True
field extensions: List[str] = []

Server extensions loaded.

field grpc_max_message_length: Optional[int] = None

Maximum length (i.e. size) of gRPC payloads.

field grpc_port: int = 8081

Port where to listen for gRPC connections.

field host: str = '0.0.0.0'

Host where to listen for connections.

field http_port: int = 8080

Port where to listen for HTTP / REST connections.

field kafka_enabled: bool = False
field kafka_servers: str = 'localhost:9092'
field kafka_topic_input: str = 'mlserver-input'
field kafka_topic_output: str = 'mlserver-output'
field load_models_at_startup: bool = True

Flag to load all available models automatically at startup.

field logging_settings: Optional[str] = None

Path to logging config file.

field metrics_endpoint: Optional[str] = '/metrics'

Endpoint used to expose Prometheus metrics. Alternatively, can be set to None to disable it.

field metrics_port: int = 8082

Port used to expose metrics endpoint.

field model_repository_root: str = '.'

Root of the model repository, where we will search for models.

field parallel_workers: int = 1

When parallel inference is enabled, number of workers to run inference across.

field root_path: str = ''

Set the ASGI root_path for applications submounted below a given URL path.

field server_name: str = 'mlserver'

Name of the server.

field server_version: str = 'v1.2.0.dev3'

Version of the server.