Tantivy 0.19 is out
Tantivy 0.19 is out! What’s new?
Tantivy 0.19 is an exciting release packed with new features! In this blog post, we will explore the most important updates and how they open up new possibilities for using Tantivy.
- Add IP address field type
- Doc store improvements
- Extended aggregation support
- Extend Query Language
See the CHANGELOG for the full list.
Short reminder: what is tantivy?
Tantivy is a high performant full-text search engine library written in Rust (benchmarks). The library is inspired by Apache Lucene and acts as a foundation to build a search engine, we use it to build our distributed search engine Quickwit.
Major features in 0.19
New field type: IP Address
IP address is an important and widely used data type. It is used to identify devices on a network and it is necessary for any type of communication. With the new IP address field type, Tantivy now enables users to query and store IP addresses in their documents. This opens a lot of possibilities for applications such as network security and analytics.
The IP address field type is capable of handling both
IPv6. It also supports complex queries such as range queries like
ip:[192.168.0.1 TO 192.168.0.199], which can be used to find documents within a certain IP range.
The field type also comes with fast field support (
DocValues in Lucene).
We implemented a tailored compression algorithm for IP addresses on the fast field. IP Addresses are internally
u128 values, which can span large ranges (between
max), therefore simple bitpacking will not be enough.
Instead we reduce the value space by removing unmapped value ranges - similar to dictionary compression - and then bitpack the values.
The compression algorithm is great for range queries, since it translates the range into compact value space once and then allows us to operate directly on the bit-packed values.
Range queries are enabled on the inverted index and on the fast field. When both exist, the fast field is chosen as the preferred method for range queries.
Doc store Improvements
The doc store has seen major improvements in this version.
First, the doc store now uses a separate thread to compress the block store.
We have observed an increase of approximately
50% overall indexing speed in our experiments, which is a remarkable improvement.
Additionally, the doc store has now configurable compression levels and block sizes. With this new feature, users can now customize their compression settings to strike a balance between performance and compression, allowing them to tailor their experience to their specific needs. In combination with the separate thread, users can increase compression levels without decreasing indexing performance, until doc store compression is slower than regular indexing.
At Quickwit, we are mostly ingesting a vast amount of unstructured logs. Compression levels are crucial here to reduce cost on S3.
This update includes support for the
date field type in aggregations, which is essential for the histogram aggregation.
Range and histogram aggregations now support the
keyed parameter, and it is possible to set a limit on the number of buckets in order to protect the server from being overloaded by invalid queries.
One of the new features is the ability to use the
IN operator to search for specific values within a field. This can be done by specifying the field name and a list of values, for example
field: IN [val1 val2 val3].
We've also added support for phrase slop (matching distance).
This allows for more flexible matching of phrases within a query, e.g.
"big wolf"~1 will return documents containing the phrase
"big bad wolf"
Thanks to all contributors
First Time Contributors
- @ryanrussell made their first contribution in #1380
- @boraarslan made their first contribution in #1382
- @pier-oliviert made their first contribution in #1415
- @kianmeng made their first contribution in #1445
- @akr4 made their first contribution in #1473
- @waywardmonkeys made their first contribution in #1524
- @nigel-andrews made their first contribution in #1608
- @theduke made their first contribution in #1624