In this guide, we will index some 40 million log entries (13 GB decompressed) on AWS S3 and start a distributed search server.
Example of a log entry:
Before using Quickwit with an object storage, check out our advice for deploying on AWS S3 to avoid some bad surprises at the end of the month.
- Configure your environment to let Quickwit access your S3 buckets.
All the following steps can be executed on any instance.
Let's create an index configured to receive these logs.
The index config defines 5 fields:
body, and two object fields
for the nested values
It also sets the default search field and a timestamp field.
This timestamp field will be used by Quickwit for sorting documents (descending order) and for splits pruning at query time to boost search speed. Check out the index config docs for details.
Now we can create the index with the new command directly on S3:
This step can be executed on your local machine. The
new command creates the index locally and will then only upload a json file
quickwit.json to your bucket at
The dataset is a compressed ndjson file. Instead of downloading and indexing the data, we will use pipes to send a decompressed stream to Quickwit directly.
4GB of RAM is enough to index this dataset; an instance like
t4g.medium with 4GB and 2 vCPU indexed this dataset in 20 minutes.
This step can also be done on your local machine. The
index command generates locally splits of 5 million documents and will upload them on your bucket. Concretely, each split is represented by a directory in which split index files are saved. Uploading a split is equivalent to uploading 9 files at
You can check it's working by using
search command and look for
serve starts an http server which provides a REST API.
Run it on each of your instances.
You will see in your terminal the confirmation that the instance has joined the cluster. Example of such a log:
Quickwit by default, opens the 8080 port; it also needs the TCP and UDP 8081 port (8080+1) for cluster formation and finally, 8082 (8080+2) for gRPC communication between instances.
In AWS, you can create a security group to group these inbound rules. Check out the network section of our AWS setup guide.
Now that you have a search cluster, ideally, you will want to load balance external requests. This can quickly be done by adding an AWS load balancer to listen to incoming HTTP or HTTPS traffic and forward it to a target group. You can now play with your cluster, kill processes randomly, add/remove new instances, and keep calm.
Let's execute a simple query that returns only
ERROR entries on field
which returns the json
You can see that this query has only 364 hits and that the server responds in 0.5 seconds.
The index config shows that we can use the timestamp field parameters
endTimestamp and benefit from time pruning. Behind the scenes, Quickwit will only query splits that have logs in this time range. This can have a significant impact on speed.
Returns 6 hits in 0.36 seconds.
Let's do some cleanup by deleting the index:
Congratz! You finished this tutorial!