Oh my, what's that on the horizon? It's a new tantivy release!

Short reminder: what is tantivy?

Tantivy is a high performance full-text search engine library written in Rust (benchmarks). The library is inspired by Apache Lucene and acts as a foundation to build a search engine, we use it to build our distributed search engine Quickwit.

Changes

Version 0.21 is a smaller release than 0.20, but it comes with a couple of nice features and improvements.

Lenient Query Parser

The default query parser in Tantivy is strict and requires users to follow a specific syntax when performing searches. If the query parser is user facing this is not ideal as users are not familiar with the syntax and can easily make mistakes. When a user enters a query that does not follow the syntax, the query parser throws an error and the search fails.

In this release, we've implemented a lenient query parser that allows invalid syntax with a fallback to handle unparseable queries. Lenient query parser will try to fix the query, e.g. \"www-form-encoded will complete the quotes of the query to \"www-form-encoded\". It is also less strict on the grammar and will accept grammar like a OR b aaa. #2129

Sort By Fast Fields

Previously only descending order was supported for fast fields. Tantivy now supports sorting results by fast fields in both ascending and descending orders. #2111

Dynamic Token Filters

This release extends the capabilities of Tantivy's text analyzer builder by allowing dynamic filters. Users can now create token filters during runtime, previously they had to be specified at compile time. #2110

Aggregations

Missing Parameter

Since tantivy 0.20, we support optional values in fast fields in tantivy. This enables us to support the missing parameter, which is a default value if a document has no value. The following aggregations support this parameter now: #2149 #2151 #2157

Term Aggregation
Stats
Percentiles
Min
Max
Count
Sum
Avg

Improved Error Messages

Aggregations can be deserialized from elastic search compatible JSON. Unfortunately deserialization error messages when using serde with untagged or flatten are not very helpful for the user on what caused the deseriliazation to fail. If the aggregation type did not match any of the variants, the error message was no variant of enum AggregationVariants found in flattened data.

In this release, we've implemented a customized Deserialization on Aggregation, to add context when the aggregation type is not found. Now we will get unknown variant doesnotmatchanyagg, expected one of ... #2150

Bugfixes

This release comes with a couple of bug fixes:

Fix track fast field memory consumption, which led to higher memory consumption than the budget allowed during indexing #2148 #2147
Fix a regression from 0.20 where sort index by date wasn't working anymore #2124
Fix getting the root facet on the FacetCollector. #2086
Align numerical type priority order of columnar and query. #2088

Remove support for Brotli and Snappy

LZ4 provides fast and simple compression whereas Zstd is exceptionally flexible so that the additional support for Brotli and Snappy does not really add any distinct functionality on top of those two algorithms for compressing the doc store. #2123

Thanks to all contributors

New Contributors

Thanks and welcome to all new contributors!

@naveenann made their first contribution in #2093
@cjrh made their first contribution in #21444
@GodTamIt made their first contribution in #2145
@ethever made their first contribution in #2165

See the CHANGELOG for the full list.

Tantivy 0.21

Short reminder: what is tantivy?

Changes

Lenient Query Parser​

Sort By Fast Fields​

Dynamic Token Filters​

Aggregations​

Missing Parameter​

Improved Error Messages​

Bugfixes​

Remove support for Brotli and Snappy​