Tantivy 0.21

Oh my, what's that on the horizon? It's a new tantivy release!
Short reminder: what is tantivy?
Tantivy is a high performance full-text search engine library written in Rust (benchmarks). The library is inspired by Apache Lucene and acts as a foundation to build a search engine, we use it to build our distributed search engine Quickwit.
Changes
Version 0.21 is a smaller release than 0.20, but it comes with a couple of nice features and improvements.
Lenient Query Parser
The default query parser in Tantivy is strict and requires users to follow a specific syntax when performing searches. If the query parser is user facing this is not ideal as users are not familiar with the syntax and can easily make mistakes. When a user enters a query that does not follow the syntax, the query parser throws an error and the search fails.
In this release, we've implemented a lenient query parser that allows invalid syntax with a fallback to handle unparseable queries.
Lenient query parser will try to fix the query, e.g.
\"www-form-encoded
will complete the quotes of the query to \"www-form-encoded\"
.
It is also less strict on the grammar and will accept grammar like a OR b aaa
.
#2129
Sort By Fast Fields
Previously only descending order was supported for fast fields. Tantivy now supports sorting results by fast fields in both ascending and descending orders. #2111
Dynamic Token Filters
This release extends the capabilities of Tantivy's text analyzer builder by allowing dynamic filters. Users can now create token filters during runtime, previously they had to be specified at compile time. #2110
Aggregations
Missing Parameter
Since tantivy 0.20, we support optional values in fast fields in tantivy. This enables us to support the missing parameter, which is a default value if a document has no value. The following aggregations support this parameter now: #2149 #2151 #2157
- Term Aggregation
- Stats
- Percentiles
- Min
- Max
- Count
- Sum
- Avg
Improved Error Messages
Aggregations can be deserialized from elastic search compatible JSON.
Unfortunately deserialization error messages when using serde with untagged
or flatten
are not very helpful for the user on
what caused the deseriliazation to fail. If the aggregation type did not match any of the variants, the error message was no variant of enum AggregationVariants found in flattened data
.
In this release, we've implemented a customized Deserialization
on Aggregation
, to add
context when the aggregation type is not found. Now we will get unknown variant doesnotmatchanyagg, expected one of ...
#2150
Bugfixes
This release comes with a couple of bug fixes:
- Fix track fast field memory consumption, which led to higher memory consumption than the budget allowed during indexing #2148 #2147
- Fix a regression from 0.20 where sort index by date wasn't working anymore #2124
- Fix getting the root facet on the
FacetCollector
. #2086 - Align numerical type priority order of columnar and query. #2088
Remove support for Brotli and Snappy
LZ4 provides fast and simple compression whereas Zstd is exceptionally flexible so that the additional support for Brotli and Snappy does not really add any distinct functionality on top of those two algorithms for compressing the doc store. #2123
Thanks to all contributors
New Contributors
Thanks and welcome to all new contributors!
- @naveenann made their first contribution in #2093
- @cjrh made their first contribution in #21444
- @GodTamIt made their first contribution in #2145
- @ethever made their first contribution in #2165
See the CHANGELOG for the full list.