Skip to main content
Version: 0.4.0

Source configuration

Quickwit can insert data into an index from one or multiple sources. When creating an index, sources are declared in the index config. Additional sources can be added later using the CLI command quickwit source create.

A source is declared using an object called source config. A source config uniquely identifies and defines a source. It consists of four parameters:

  • source ID
  • source type
  • source parameters
  • transform parameters (optional)

Source ID

The source ID is a string that uniquely identifies the source within an index. It may only contain uppercase or lowercase ASCII letters, digits, hyphens (-), and underscores (_). Finally, it must start with a letter and contain at least 3 characters but no more than 255.

Source type

The source type designates the kind of source being configured. As of version 0.3, available source types are file, kafka, and kinesis.

Source parameters

The source parameters indicate how to connect to a data store and are specific to the source type.

File source

A file source reads data from a local file. The file must consist of JSON objects separated by a newline. As of version 0.4, compressed files (bz2, gzip, ...) and remote files (Amazon S3, HTTP, ...) are not supported.

File source parameters

PropertyDescriptionDefault value
filepathPath to a local file consisting of JSON objects separated by a newline.

Declaring a file source in an index config (YAML)

# Version of the index config file format
version: 0.4

# Sources
sources:
- source_id: my-file-source
source_type: file
params:
filepath: path/to/local/file.json

# The rest of your index config here
# ...

Adding a file source to an index with the CLI

cat << EOF > source-config.yaml
source_id: my-file-source
source_type: file
params:
filepath: path/to/local/file.json # The file must exist.
EOF
quickwit source create --index my-index --source-config source-config.yaml

Finally, note that the CLI command quickwit index ingest allows ingesting data directly from a file or the standard input without creating a source beforehand.

Kafka source

A Kafka source reads data from a Kafka stream. Each message in the stream must hold a JSON object.

Kafka source parameters

The Kafka source consumes a topic using the client library librdkafka and forwards the key-value pairs carried by the parameter client_params to the underlying librdkafka consumer. Common client_params options are bootstrap servers (bootstrap.servers), or security protocol (security.protocol). Please, refer to Kafka and librdkafka documentation pages for more advanced options.

PropertyDescriptionDefault value
topicName of the topic to consume.required
client_log_levellibrdkafka client log level. Possible values are: debug, info, warn, error.info
client_paramslibrdkafka client configuration parameters.{}
enable_backfill_modeBackfill mode stops the source after reaching the end of the topic.false

Kafka client parameters

  • bootstrap.servers Comma-separated list of host and port pairs that are the addresses of a subset of the Kafka brokers in the Kafka cluster.

  • enable.auto.commit The Kafka source manages commit offsets manually using the checkpoint API and disables auto-commit.

  • group.id Kafka-based distributed indexing relies on consumer groups. The group ID assigned to each consumer managed by the source is quickwit-{index_id}-{source_id}.

  • max.poll.interval.ms Short max poll interval durations may cause a source to crash when back pressure from the indexer occurs. Therefore, Quickwit recommends using the default value of 300000 (5 minutes).

Declaring a Kafka source in an index config (YAML)

# Version of the index config file format
version: 0.4

# Sources
sources:
- source_id: my-kafka-source
source_type: kafka
params:
topic: my-topic
client_params:
bootstrap.servers: localhost:9092
security.protocol: SSL

# The rest of your index config here
# ...

Adding a Kafka source to an index with the CLI

cat << EOF > source-config.yaml
source_id: my-kafka-source
source_type: kafka
params:
topic: my-topic
client_params:
bootstrap.servers: localhost:9092
security.protocol: SSL
EOF
quickwit source create --index my-index --source-config source-config.yaml

Kinesis source

A Kinesis source reads data from an Amazon Kinesis stream. Each message in the stream must hold a JSON object.

Kinesis source parameters

The Kinesis source consumes a stream identified by a stream_name and a region.

PropertyDescriptionDefault value
stream_nameName of the stream to consume.required
regionThe AWS region of the stream. Mutually exclusive with endpoint.us-east-1
endpointCustom endpoint for use with AWS-compatible Kinesis service. Mutually exclusive with region.optional

If no region is specified, Quickwit will attempt to find one in multiple other locations and with the following order of precedence:

  1. Environment variables (AWS_REGION then AWS_DEFAULT_REGION)

  2. Config file, typically located at ~/.aws/config or otherwise specified by the AWS_CONFIG_FILE environment variable if set and not empty.

  3. Amazon EC2 instance metadata service determining the region of the currently running Amazon EC2 instance.

  4. Default value: us-east-1

Declaring a Kinesis source in an index config (YAML)

# Version of the index config file format
version: 0.4

# Sources
sources:
- source_id: my-kinesis-source
source_type: kinesis
params:
stream_name: my-stream

# The rest of your index config here
# ...

Adding a Kinesis source to an index with the CLI

cat << EOF > source-config.yaml
source_id: my-kinesis-source
source_type: kinesis
params:
stream_name: my-stream
EOF
quickwit source create --index my-index --source-config source-config.yaml

Deleting a source from an index

A source can be removed from an index using the CLI command quickwit source delete:

quickwit source delete --index my-index --source my-source

When deleting a source, the checkpoint associated with the source is also removed.

Transform parameters

Ingested documents can be transformed before being indexed using Vector Remap Language (VRL) scripts.

Transform parameters

PropertyDescriptionDefault value
scriptsource code of the VRL program executed to transform documentsrequired
timezoneTimezone used in the VRL program for date and time manipulations. Must be a valid name in the TZ databaseUTC
# Version of the index config file format
version: 0.4

# Sources
sources:
# ...
transform:
source: |
.message = downcase(string!(.message))
.timestamp = now()
del(.username)
timezone: local

# The rest of your index config here
# ...