Version: main branch

Updating the doc mapping of an index

Quickwit allows updating the mapping it uses to add more fields to an existing index or change how they are indexed. In doing so, it does not reindex existing data but still lets you search through older documents where possible.

Indexing

When you update a doc mapping for an index, Quickwit will restart indexing pipelines to take the changes into account. As both this operation and the document ingestion are asynchronous, there is no strict happens-before relationship between ingestion and update. This means a document ingested just before the update may be indexed according to the newer doc mapper, and document ingested just after the update may be indexed with the older doc mapper.

danger

If you use the ingest or ES bulk API (V2), the old doc mapping will still be used to validate new documents that end up being persisted on existing shards (see #5738).

Querying

Quickwit always validate queries against the most recent mapping. If a query was valid under a previous mapping but is not compatible with the newer mapping, that query will be rejected. For instance if a field which was indexed no longer is, any query that uses it will become invalid. On the other hand, if a query was not valid for a previous doc mapping, but is valid under the new doc mapping, Quickwit will process the query. When querying newer splits, it will behave normally, when querying older splits, it will try to execute the query as correctly as possible. If you find yourself in a situation where older splits causes a valid request to return an error, please open a bug report. See examples 1 and 2 below for clarification.

Change in tokenizer affect only newer splits, older splits keep using the tokenizers they were created with.

Document retrieved are mapped from Quickwit internal format to JSON based on the latest doc mapping. This means if fields are deleted, they will stop appearing (see also Reversibility below) unless mapper mode is Dynamic. If the type of some field changed, it will be converted on a best-effort basis: integers will get turned into text, text will get turned into string when it is possible, otherwise, the field is omited. See example 3 for clarification.

Reversibility

Quickwit does not modify existing data when receiving a new doc mapping. If you realize that you updated the mapping in a wrong way, you can re-update your index using the previous mapping. Documents indexed while the mapping was wrong will be impacted, but any document that was committed before the change will be queryable as if nothing happened.

Type update reference

Conversion from a type to itself is omitted. Conversions that never succeed and always omit the field are omitted, too.

type before	type after
u64/i64/f64	text
date	text
ip	text
bool	text
u64/i64/f64	bool
text	bool
text	ip
text	f64
u64/i64	f64
bool	f64
text	u64
i64	u64
f64	u64
text	i64
u64	i64
f64	i64
bool	i64
text	datetime
u64	datetime
i64	datetime
f64	datetime
array\<T>	array\<U>
T	array\<U>
array\<T>	U
json	object
object	json

Examples

In the below examples, fields which are not relevant are removed for conciseness, you will not be able to use these index config as is.

Example 1

before:

doc_mapping:
  field_mappings:
    - name: field1
      type: text
      tokenizer: raw

after:

doc_mapping:
  field_mappings:
    - name: field1
      type: text
      indexed: false

A field changed from being indexed to not being indexed. A query such as field1:my_value was valid, but is now rejected.

Example 2

before:

doc_mapping:
  field_mappings:
    - name: field1
      type: text
      indexed: false
    - name: field2
      type: text
      tokenizer: raw

after:

doc_mapping:
  field_mappings:
    - name: field1
      type: text
      tokenizer: raw
    - name: field2
      type: text
      tokenizer: raw

A field changed from being not indexed to being indexed. A query such as field1:my_value was invalid before, and is now valid. When querying older splits, it won't return a match, but won't return an error either. A query such as field1:my_value OR field2:my_value is now valid too. For old splits, it will return the same results as field2:my_value as field1 wasn't indexed before. For newer splits, it will return the expected results. A query such as NOT field1:my_value would return all documents for old splits, and only documents where field1 is not my_value for newer splits.

Example 3

show cast (trivial, valid and invalid)

show array to single

before:

doc_mapping:
  field_mappings:
    - name: field1
      type: text
    - name: field2
      type: u64
    - name: field3
      type: array<text>

document presents before update:

{
  "field1": "123",
  "field2": 456,
  "field3": ["abc", "def"]
}
{
  "field1": "message",
  "field2": 987,
  "field3": ["ghi"]
}

after:

doc_mapping:
  field_mappings:
    - name: field1
      type: u64
    - name: field2
      type: text
    - name: field3
      type: text

When querying this index, the documents returned would become:

{
  "field1": 123,
  "field2": "456",
  "field3": "abc"
}
{
  // field1 is missing because "message" can't be converted to int
  "field2": "987",
  "field3": "ghi"
}

Updating the doc mapping of an index

Indexing​

Querying​

Reversibility​

Type update reference​

Examples​

Example 1​

Example 2​

Example 3​

show cast (trivial, valid and invalid)

show array to single

Indexing

Querying

Reversibility

Type update reference

Examples

Example 1

Example 2

Example 3