Version: main branch

Ingest API

In this tutorial, we will describe how to send data to Quickwit using the ingest API.

You will need a local Quickwit instance up and running to follow this tutorial.

To start it, run ./quickwit run in a terminal.

Create an index

First, let's create a schemaless index.

# Create the index config file.
cat << EOF > stackoverflow-schemaless-config.yaml
version: 0.7
index_id: stackoverflow-schemaless
doc_mapping:
  mode: dynamic
  dynamic_mapping:
    tokenizer: default
indexing_settings:
  commit_timeout_secs: 30
EOF
# Use the CLI to create the index...
./quickwit index create --index-config stackoverflow-schemaless-config.yaml
# Or with cURL.
curl -XPOST -H 'Content-Type: application/yaml' 'http://localhost:7280/api/v1/indexes' --data-binary @stackoverflow-schemaless-config.yaml

Note that for this example, we configure the dynamic mapping to use the default tokenizer. This is necessary to enable full-text search on all text fields.

Ingest data

Let's first download a sample of the StackOverflow dataset.

# Download the first 10_000 Stackoverflow posts articles.
curl -O https://quickwit-datasets-public.s3.amazonaws.com/stackoverflow.posts.transformed-10000.json

You can ingest data either with the CLI or with cURL. The CLI is more convenient for ingesting several GB as Quickwit may return 429 responses if the ingest queue is full. Quickwit CLI will automatically retry ingestion in this case.

# Ingest the first 10_000 Stackoverflow posts articles with the CLI...
./quickwit index ingest --index stackoverflow-schemaless --input-path stackoverflow.posts.transformed-10000.json --force

# OR with cURL.
curl -XPOST -H 'Content-Type: application/json' 'http://localhost:7280/api/v1/stackoverflow-schemaless/ingest?commit=force' --data-binary @stackoverflow.posts.transformed-10000.json

Execute search queries

You can now search the index.

curl 'http://localhost:7280/api/v1/stackoverflow-schemaless/search?query=body:python'

Tear down resources (optional)

curl -XDELETE 'http://localhost:7280/api/v1/indexes/stackoverflow-schemaless'

This concludes the tutorial. You can now move on to the next tutorial to learn how to ingest data from Kafka.

Ingest API versions

In 0.9, Quickwit introduced a new version of the ingest API that enables distributing the indexing in the cluster regardless of the node that received the ingest request. This new ingestion service is often referred to as "Ingest V2" compared to the legacy ingestion (V1). In upcoming versions the new ingest API will also be capable of replicating the write ahead log in order to achieve higher durability.

By default, both ingestion services are enabled and ingest V2 is used. You can toggle this behavior with the following environment variables:

Variable	Description	Default value
`QW_ENABLE_INGEST_V2`	Start the V2 ingest service and use it by default.	true
`QW_DISABLE_INGEST_V1`	V1 ingest will be used by the APIs only if V2 is disabled. Running V1 along V2 is necessary to migrate to V2 without loosing existing unindexed V1 logs.	false

note

These configurations drive the ingest service used both by the api/v1/<index-id>/ingest endpoint and the bulk API.

Create an index​

Ingest data​

Execute search queries​

Tear down resources (optional)​

Ingest API versions​

Create an index

Ingest data

Execute search queries

Tear down resources (optional)

Ingest API versions