Tantivy 0.17 is out
Tantivy 0.17 is out! What’s new?
Tantivy 0.17 is a great update that packs a lot of exciting features. We will describe the main ones in this blog post and open up the possibilities of using Tantivy:
- Aggregation collector compatible with Elasticsearch.
- JSON field type
- LogMergePolicy with deletes
- Searcher Warmer API
See the CHANGELOG for more details.
Short reminder: what is tantivy?
Tantivy is a high performant full-text search engine library written in Rust (benchmarks will be updated soon with the new version). The library is inspired by Apache Lucene and acts as a foundation to build a search engine, we use it to build our distributed search engine Quickwit.
Major features in 0.17
Aggregation Collector compatible with Elasticsearch
An aggregation summarizes your data like statistics, it can provide answers to questions like:
- What is the average price of all sold articles?
- How many errors with status code 500 do we have per day?
- What is the average listing price of cars grouped by color?
Tantivy 0.17 supports different aggregations: range buckets, average, and stats metrics. More will come soon. Elasticsearch compatibility is supported by de/serialization to JSON.
This is the fundamental piece of code used for making nice charts & dashboards and it’s now in tantivy!
New field type: JSON Field
At Quickwit, we are mostly ingesting a vast amount of unstructured logs. Thus, being able to ingest an actual JSON object without knowing the schema in advance is a key feature.
That’s the purpose of this new field type, JSON Field. Internally, when indexing a JSON object, we "flatten" it and index each key-value pair. There is a number of pitfalls though like unsupported range queries or unexpected search results on arrays. Read the docs to dig into this feature.
LogMergePolicy with deletes
This feature closes an issue opened in 2017 and we are very thankful for shikhar that took the lead on this. Before 0.17, the merge policy was not taking into account deleted documents in segments and it could lead to segments with almost all of its docs deleted. This can be now controlled by setting a deleted document ratio threshold: tantivy will then merge segment having a ratio above that threshold.
Search Warmer API
Tantivy 0.17 provides now a Search Warmer API, which opens up the possibility to maintain segment-level state e.g. caches. For example, it can be used to load values from external sources, to dig in you can check out this code example.
Cheesy tantivy badge
For projects that are using tantivy, we have made for you a sleek horse badge that you can add to your README page!
Thanks to all contributors of the 0.17 Release
Thanks to all contributors that helped us to bring 0.17 into life :