Skip to main content

Tantivy 0.21

Tantivy 0.21 is out.

Oh my, what's that on the horizon? It's a new tantivy release!

Short reminder: what is tantivy?

Tantivy is a high performance full-text search engine library written in Rust (benchmarks). The library is inspired by Apache Lucene and acts as a foundation to build a search engine, we use it to build our distributed search engine Quickwit.

Changes

Version 0.21 is a smaller release than 0.20, but it comes with a couple of nice features and improvements.

Lenient Query Parser

The default query parser in Tantivy is strict and requires users to follow a specific syntax when performing searches. If the query parser is user facing this is not ideal as users are not familiar with the syntax and can easily make mistakes. When a user enters a query that does not follow the syntax, the query parser throws an error and the search fails.

In this release, we've implemented a lenient query parser that allows invalid syntax with a fallback to handle unparseable queries. Lenient query parser will try to fix the query, e.g. \"www-form-encoded will complete the quotes of the query to \"www-form-encoded\". It is also less strict on the grammar and will accept grammar like a OR b aaa. #2129

Sort By Fast Fields

Previously only descending order was supported for fast fields. Tantivy now supports sorting results by fast fields in both ascending and descending orders. #2111

Dynamic Token Filters

This release extends the capabilities of Tantivy's text analyzer builder by allowing dynamic filters. Users can now create token filters during runtime, previously they had to be specified at compile time. #2110

Aggregations

Missing Parameter

Since tantivy 0.20, we support optional values in fast fields in tantivy. This enables us to support the missing parameter, which is a default value if a document has no value. The following aggregations support this parameter now: #2149 #2151 #2157

  • Term Aggregation
  • Stats
  • Percentiles
  • Min
  • Max
  • Count
  • Sum
  • Avg

Improved Error Messages

Aggregations can be deserialized from elastic search compatible JSON. Unfortunately deserialization error messages when using serde with untagged or flatten are not very helpful for the user on what caused the deseriliazation to fail. If the aggregation type did not match any of the variants, the error message was no variant of enum AggregationVariants found in flattened data.

In this release, we've implemented a customized Deserialization on Aggregation, to add context when the aggregation type is not found. Now we will get unknown variant doesnotmatchanyagg, expected one of ... #2150

Bugfixes

This release comes with a couple of bug fixes:

  • Fix track fast field memory consumption, which led to higher memory consumption than the budget allowed during indexing #2148 #2147
  • Fix a regression from 0.20 where sort index by date wasn't working anymore #2124
  • Fix getting the root facet on the FacetCollector. #2086
  • Align numerical type priority order of columnar and query. #2088

Remove support for Brotli and Snappy

LZ4 provides fast and simple compression whereas Zstd is exceptionally flexible so that the additional support for Brotli and Snappy does not really add any distinct functionality on top of those two algorithms for compressing the doc store. #2123

Thanks to all contributors

New Contributors

Thanks and welcome to all new contributors!

  • @naveenann made their first contribution in #2093
  • @cjrh made their first contribution in #21444
  • @GodTamIt made their first contribution in #2145
  • @ethever made their first contribution in #2165

See the CHANGELOG for the full list.