Tantivy 0.19 is out
Tantivy 0.19 is out! What’s new?
Tantivy 0.19 is an exciting release packed with new features! In this blog post, we will explore the most important updates and how they open up new possibilities for using Tantivy.
- Add IP address field type
- Doc store improvements
- Extended aggregation support
- Extend Query Language
See the CHANGELOG for the full list.
Short reminder: what is tantivy?
Tantivy is a high performant full-text search engine library written in Rust (benchmarks). The library is inspired by Apache Lucene and acts as a foundation to build a search engine, we use it to build our distributed search engine Quickwit.
Major features in 0.19
New field type: IP Address
IP address is an important and widely used data type. It is used to identify devices on a network and it is necessary for any type of communication. With the new IP address field type, Tantivy now enables users to query and store IP addresses in their documents. This opens a lot of possibilities for applications such as network security and analytics.
The IP address field type is capable of handling both IPv4
and IPv6
. It also supports complex queries such as range queries like ip:[192.168.0.1 TO 192.168.0.199]
, which can be used to find documents within a certain IP range.
The field type also comes with fast field support (DocValues
in Lucene).
We implemented a tailored compression algorithm for IP addresses on the fast field. IP Addresses are internally u128
values, which can span large ranges (between min
and max
), therefore simple bitpacking will not be enough.
Instead we reduce the value space by removing unmapped value ranges - similar to dictionary compression - and then bitpack the values.
The compression algorithm is great for range queries, since it translates the range into compact value space once and then allows us to operate directly on the bit-packed values.
Range queries are enabled on the inverted index and on the fast field. When both exist, the fast field is chosen as the preferred method for range queries.
Doc store Improvements
The doc store has seen major improvements in this version.
First, the doc store now uses a separate thread to compress the block store.
We have observed an increase of approximately 50%
overall indexing speed in our experiments, which is a remarkable improvement.
Additionally, the doc store has now configurable compression levels and block sizes. With this new feature, users can now customize their compression settings to strike a balance between performance and compression, allowing them to tailor their experience to their specific needs. In combination with the separate thread, users can increase compression levels without decreasing indexing performance, until doc store compression is slower than regular indexing.
At Quickwit, we are mostly ingesting a vast amount of unstructured logs. Compression levels are crucial here to reduce cost on S3.
Aggregation
This update includes support for the date field type
in aggregations, which is essential for the histogram aggregation.
Range and histogram aggregations now support the keyed
parameter, and it is possible to set a limit on the number of buckets in order to protect the server from being overloaded by invalid queries.
Query Language
One of the new features is the ability to use the IN
operator to search for specific values within a field. This can be done by specifying the field name and a list of values, for example field: IN [val1 val2 val3]
.
We've also added support for phrase slop (matching distance).
This allows for more flexible matching of phrases within a query, e.g. "big wolf"~1
will return documents containing the phrase "big bad wolf"
Thanks to all contributors
First Time Contributors
- @ryanrussell made their first contribution in #1380
- @boraarslan made their first contribution in #1382
- @pier-oliviert made their first contribution in #1415
- @kianmeng made their first contribution in #1445
- @akr4 made their first contribution in #1473
- @waywardmonkeys made their first contribution in #1524
- @nigel-andrews made their first contribution in #1608
- @theduke made their first contribution in #1624