In this guide, we will index some 40 million log entries (13 GB decompressed) on a local machine. If you want to start a server with indexes on AWS S3, check out the tutorial for distributed search.
Here is an example of a log entry:
Let's download and install Quickwit.
Let's create an index configured to receive these logs.
The index config defines 5 fields:
body, and two object fields
for the nested values
It also sets the default search field and a timestamp field.
This timestamp field will be used by Quickwit for sorting documents (descending order) and for splits pruning at query time to boost search speed. Check out the index config docs for details.
Now we can create the index with the new command (assuming you create it in your current directory):
You're now ready to fill the index.
The dataset is a compressed ndjson file. Instead of downloading it and then indexing the data, we will use pipes to directly send a decompressed stream to Quickwit. This can take up to 10 min on a modern machine, the perfect time for a coffee break.
You can check it's working by using
search command and look for
index command generates splits of 5 millions documents. Each split is a small piece of index represented by a directory in which index files are saved. For this dataset, you will end with 9 splits, the last
one will be very small.
serve starts an http server which provides a REST API.
Let's execute the same query on field
severity_text but with
which returns the json
The index config shows that we can use the timestamp field parameters
endTimestamp and benefit from time pruning. Behind the scenes, Quickwit will only query splits that have logs in this time range.
Let's use these parameters with the following query:
It should return 6 hits faster as Quickwit will query fewer splits.
Let's do some cleanup by deleting the index:
Congratz! You finished this tutorial!