Skip to main content
Version: 0.8.1

Node configuration

The node configuration allows you to customize and optimize the settings for individual nodes in your cluster. It is divided into several sections:

  • Common configuration settings: shared top-level properties
  • Storage settings: defined in the storage section
  • Metastore settings: defined in the metastore section
  • Ingest settings: defined in the ingest_api section
  • Indexer settings: defined in the indexer section
  • Searcher settings: defined in the searcher section
  • Jaeger settings: defined in the jaeger section

A commented example is available here: quickwit.yaml.

Common configuration

PropertyDescriptionEnv variableDefault value
versionConfig file version. 0.7 is the only available value with a retro compatibility on 0.5 and 0.4.
cluster_idUnique identifier of the cluster the node will be joining. Clusters sharing the same network should use distinct cluster IDs.QW_CLUSTER_IDquickwit-default-cluster
node_idUnique identifier of the node. It must be distinct from the node IDs of its cluster peers. Defaults to the instance's short hostname if not set.QW_NODE_IDshort hostname
enabled_servicesEnabled services (control_plane, indexer, janitor, metastore, searcher)QW_ENABLED_SERVICESall services
listen_addressThe IP address or hostname that Quickwit service binds to for starting REST and GRPC server and connecting this node to other nodes. By default, Quickwit binds itself to 127.0.0.1 (localhost). This default is not valid when trying to form a cluster.QW_LISTEN_ADDRESS127.0.0.1
advertise_addressIP address advertised by the node, i.e. the IP address that peer nodes should use to connect to the node for RPCs.QW_ADVERTISE_ADDRESSlisten_address
gossip_listen_portThe port which to listen for the Gossip cluster membership service (UDP).QW_GOSSIP_LISTEN_PORTrest.listen_port
grpc_listen_portThe port on which gRPC services listen for traffic.QW_GRPC_LISTEN_PORTrest.listen_port + 1
peer_seedsList of IP addresses or hostnames used to bootstrap the cluster and discover the complete set of nodes. This list may contain the current node address and does not need to be exhaustive. If the list of peer seeds contains a host name, Quickwit will resolve it by querying the DNS every minute. On kubernetes for instance, it is a good practise to set it to a headless service.QW_PEER_SEEDS
data_dirPath to directory where data (tmp data, splits kept for caching purpose) is persisted. This is mostly used in indexing.QW_DATA_DIR./qwdata
metastore_uriMetastore URI. Can be a local directory or s3://my-bucket/indexes or postgres://username:password@localhost:5432/metastore. Learn more about the metastore configuration.QW_METASTORE_URI{data_dir}/indexes
default_index_root_uriDefault index root URI that defines the location where index data (splits) is stored. The index URI is built following the scheme: {default_index_root_uri}/{index-id}QW_DEFAULT_INDEX_ROOT_URI{data_dir}/indexes
environment variable onlyLog level of Quickwit. Can be a direct log level, or a comma separated list of module_name=levelRUST_LOGinfo

REST configuration

This section contains the REST API configuration options.

PropertyDescriptionEnv variableDefault value
listen_portThe port on which the REST API listens for HTTP traffic.QW_REST_LISTEN_PORT7280
cors_allow_originsConfigure the CORS origins which are allowed to access the API. Read more
extra_headersList of header names and values

Configuring CORS (Cross-origin resource sharing)

CORS (Cross-origin resource sharing) describes which address or origins can access the REST API from the browser. By default, sharing resources cross-origin is not allowed.

A wildcard, single origin, or multiple origins can be specified as part of the cors_allow_origins parameter:

Example of a REST configuration:

rest:
listen_port: 1789
extra_headers:
x-header-1: header-value-1
x-header-2: header-value-2
cors_allow_origins: '*'

# cors_allow_origins: https://my-hdfs-logs.domain.com # Optionally we can specify one domain
# cors_allow_origins: # Or allow multiple origins
# - https://my-hdfs-logs.domain.com
# - https://my-hdfs.other-domain.com

gRPC configuration

This section contains the configuration options for gRPC services and clients used for internal communication between nodes.

PropertyDescriptionEnv variableDefault value
max_message_sizeThe maximum size (in bytes) of messages exchanged by internal gRPC clients and services.20 MiB

Example of a gRPC configuration:

grpc:
max_message_size: 30 MiB
danger

We advise changing the default value of 20 MiB only if you encounter the following error: Error, message length too large: found 24732228 bytes, the limit is: 20971520 bytes. In that case, increase max_message_size by increments of 10 MiB until the issue disappears. This is a temporary fix: the next version of Quickwit, 0.8, will rely exclusively on gRPC streaming endpoints and handle messages of any length.

Storage configuration

Please refer to the dedicated storage configuration page to learn more about configuring Quickwit for various storage providers.

Here are also some minimal examples of how to configure Quickwit with Amazon S3 or Alibaba OSS:

AWS_ACCESS_KEY_ID=<your access key ID>
AWS_SECRET_ACCESS_KEY=<your secret access key>

Amazon S3

storage:
s3:
region: us-east-1

Alibaba

storage:
s3:
region: us-east-1
endpoint: https://oss-us-east-1.aliyuncs.com

Metastore configuration

This section may contain one configuration subsection per available metastore implementation. The specific configuration parameters for each implementation may vary. Currently, the available metastore implementations are:

  • File-backed
  • PostgreSQL

File-backed metastore configuration

File-backed metastore doesn't have any node level configuration. You can configure the poll interval at the index level.

PostgreSQL metastore configuration

PropertyDescriptionDefault value
min_connectionsMinimum number of connections to maintain in the pool at all times.0
max_connectionsMaximum number of connections to maintain in the pool.10
acquire_connection_timeoutMaximum amount of time to spend waiting for an available connection before aborting a query.10s
idle_connection_timeoutMaximum idle duration before closing individual connections.10min
max_connection_lifetimeMaximum lifetime of individual connections.30min

Example of a metastore configuration for PostgreSQL in YAML format:

metastore:
postgres:
min_connections: 10
max_connections: 50
acquire_connection_timeout: 30s
idle_connection_timeout: 1h
max_connection_lifetime: 1d

Indexer configuration

This section contains the configuration options for an indexer. The split store is documented in the indexing document.

PropertyDescriptionDefault value
split_store_max_num_bytesMaximum size in bytes allowed in the split store.100G
split_store_max_num_splitsMaximum number of files allowed in the split store.1000
max_concurrent_split_uploadsMaximum number of concurrent split uploads allowed on the node.12
merge_concurrencyMaximum number of merge operations that can be executed on the node at one point in time.(2 x num threads available) / 3
enable_otlp_endpointIf true, enables the OpenTelemetry exporter endpoint to ingest logs and traces via the OpenTelemetry Protocol (OTLP).false
cpu_capacityAdvisory parameter used by the control plane. The value can expressed be in threads (e.g. 2) or in term of millicpus (2000m). The control plane will attempt to schedule indexing pipelines on the different nodes proportionally to the cpu capacity advertised by the indexer. It is NOT used as a limit. All pipelines will be scheduled regardless of whether the cluster has sufficient capacity or not. The control plane does not attempt to spread the work equally when the load is well below the cpu_capacity. Users who need a balanced load on all of their indexer nodes can set the cpu_capacity to an arbitrarily low value as long as they keep it proportional to the number of threads available.num threads available

Example:

indexer:
split_store_max_num_bytes: 100G
split_store_max_num_splits: 1000
max_concurrent_split_uploads: 12
enable_otlp_endpoint: true

Ingest API configuration

PropertyDescriptionDefault value
max_queue_memory_usageMaximum size in bytes of the in-memory Ingest queue.2GiB
max_queue_disk_usageMaximum disk-space in bytes taken by the Ingest queue. The minimum size is at least 256M and be at least max_queue_memory_usage.4GiB

Example:

ingest_api:
max_queue_memory_usage: 2GiB
max_queue_disk_usage: 4GiB

Searcher configuration

This section contains the configuration options for a Searcher.

PropertyDescriptionDefault value
aggregation_memory_limitControls the maximum amount of memory that can be used for aggregations before aborting. This limit is per request and single leaf query (a leaf query is querying one or multiple splits concurrently). It is used to prevent excessive memory usage during the aggregation phase, which can lead to performance degradation or crashes. Since it is per request, concurrent requests can exceed the limit.500M
aggregation_bucket_limitDetermines the maximum number of buckets returned to the client.65000
fast_field_cache_capacityFast field in memory cache capacity on a Searcher. If your filter by dates, run aggregations, range queries, or if you use the search stream API, or even for tracing, it might worth increasing this parameter. The metrics starting by quickwit_cache_fastfields_cache can help you make an informed choice when setting this value.1G
split_footer_cache_capacitySplit footer in memory cache (it is essentially the hotcache) capacity on a Searcher.500M
partial_request_cache_capacityPartial request in memory cache capacity on a Searcher. Cache intermediate state for a request, possibly making subsequent requests faster. It can be disabled by setting the size to 0.64M
max_num_concurrent_split_searchesMaximum number of concurrent split search requests running on a Searcher.100
max_num_concurrent_split_streamsMaximum number of concurrent split stream requests running on a Searcher.100
split_cacheSearcher split cache configuration options defined in the section below. Cache disabled if unspecified.

Searcher split cache configuration

This section contains the configuration options for the on disk searcher split cache.

PropertyDescriptionDefault value
max_num_bytesMaximum disk size in bytes allowed in the split cache. Can be exceeded by the size of one split.
max_num_splitsMaximum number of splits allowed in the split cache.10000
num_concurrent_downloadsMaximum number of concurrent download of splits.1

Example:

searcher:
fast_field_cache_capacity: 1G
split_footer_cache_capacity: 500M
partial_request_cache_capacity: 64M
split_cache:
max_num_bytes: 1G
max_num_splits: 10000
num_concurrent_downloads: 1

Jaeger configuration

PropertyDescriptionDefault value
enable_endpointIf true, enables the gRPC endpoint that allows the Jaeger Query Service to connect and retrieve traces.false

Example:

searcher:
enable_endpoint: true

Using environment variables in the configuration

You can use environment variable references in the config file to set values that need to be configurable during deployment. To do this, use:

${VAR_NAME}

where VAR_NAME is the name of the environment variable.

Each variable reference is replaced at startup by the value of the environment variable. The replacement is case-sensitive and occurs before the configuration file is parsed. Referencing undefined variables throws an error unless you specify a default value or custom error text.

To specify a default value, use:

${VAR_NAME:-default_value}

where default_value is the value to use if the environment variable is unset.

<config_field>: ${VAR_NAME}
or
<config_field>: ${VAR_NAME:-default value}

For example:

export QW_LISTEN_ADDRESS=0.0.0.0
# config.yaml
version: 0.7
cluster_id: quickwit-cluster
node_id: my-unique-node-id
listen_address: ${QW_LISTEN_ADDRESS}
rest:
listen_port: ${QW_LISTEN_PORT:-1111}

Will be interpreted by Quickwit as:

version: 0.7
cluster_id: quickwit-cluster
node_id: my-unique-node-id
listen_address: 0.0.0.0
rest:
listen_port: 1111