Benchmarking Quickwit vs. Loki

In 2019, Grafana launched Loki, a new log aggregation system, to tackle the challenges commonly faced by teams operating and scaling Elasticsearch:

High total cost of ownership (TCO)
Slow indexing
Difficult to scale to terabytes
Complex and hard to manage

The consensus was that Elasticsearch wasn't the right solution for efficient log management. Loki aimed to provide a cost-efficient through a minimalistic indexing approach that indexes only metadata (labels), inspired by Prometheus’s labeling system, and leveraged cheap object storage. Humio¹ had a similar approach with its unique index-free architecture.

In 2021, we launched Quickwit with the same goal but a different approach. While we also targeted cost-efficiency and use object storage, we maintained the inverted index, combining it with a columnar storage. Our ultimate goal was to provide a search engine on object storage, retaining the strengths of traditional search engines while addressing the good complaints coming from Elasticsearch users.

Today, as Quickwit is gaining traction and being adopted by companies at petabyte scale², we are often asked about the trade-offs of using Quickwit vs. Loki for log management.

This blog post explores these differences, evaluates their trade-offs, and benchmarks them against a reasonably large log dataset to provide relevant insights. As benchmarking another tool against one's own is very challenging, I strove to present a balanced view with humility, welcoming input from both the Loki and Quickwit communities. I'm convinced that the benchmark results are worth sharing, and I hope you will gain a better understanding of the two engines!

Establishing a Fair Comparison

Quickwit and Loki target the same use case: log search, with a shared goal of cost-effectiveness at terabyte scale and beyond. They achieve this by decoupling computing and storage, optimizing data compression, and maintaining stateless architectures to simplify cluster management.

However, their querying capabilities are substantially different:

Quickwit offers a simplified query language: it typically supports keyword queries, prefix queries, and phrase queries but lacks support for regex queries.
Loki supports a broader set of queries, including regex capabilities and log content parsing for field extraction.

The benchmark will thus evaluate a subset of the Loki query language since Quickwit is less powerful there. Considering the architectures of both engines, we can already anticipate the performance characteristics that we should observe:

Quickwit builds an inverted index and columnar store, it should offer faster search and analytics capabilities at the expense of higher ingestion costs.
Loki favors quick ingestion but should exhibit slower query responses as it is mostly index-free, only labels are used to categorize data into streams.

The main goal of the benchmark is to materialize the trade-offs between Quickwit and Loki on a log dataset.

Benchmark setup

For this benchmark, we considered a typical scenario of ingesting hundreds of GBs of logs per day. We used the Elastic Integration Corpus Generator Tool to create our dataset, which is publicly available in our Google Storage bucket. The dataset contains the first 200 files (generated-logs-v1-*), totaling approximately 212.40 GB across 243,527,673 logs.

The JSON structure of our logs is as follows:

{
    "agent": {
      "ephemeral_id": "9d0fd4b2-0cf1-4b9b-9ad1-61e46657134d",
      "id": "9d0fd4b2-0cf1-4b9b-9ad1-61e46657134d",
      "name": "coldraccoon",
      "type": "filebeat",
      "version": "8.8.0"
    },
    "aws.cloudwatch": {
      "ingestion_time": "2023-09-17T13:31:04.741Z",
      "log_stream": "novachopper"
    },
    "cloud": {
      "region": "us-east-1"
    },
    "event": {
      "dataset": "generic",
      "id": "peachmare",
      "ingested": "2023-09-17T12:48:00.741424Z"
    },
    "host": {
      "name": "coldraccoon"
    },
    "input": {
      "type": "aws-cloudwatch"
    },
    "level": "INFO",
    "log.file.path": "/var/log/messages/novachopper",
    "message": "2023-09-17T13:31:04.741Z Sep 17 13:31:04 ip-187-57-167-52 systemd: jackal fancier hero griffin finger scale fireroar",
    "metrics": {
      "size": 390145,
      "tmin": 68811
    },
    "process": {
      "name": "systemd"
    },
    "tags": [
      "preserve_original_event"
    ],
    "timestamp": 1673247599,
    "trace_id": "5161051656584663225"
}

Evaluation Metrics

We assessed several metrics during data ingestion and querying:

Ingestion time, CPU utilization, index size, peak RAM usage, and object storage file count.
Query performance measured by latency, CPU time, and the number of GET requests on object storage.

To stick to the log search use case, we focused on the two types of queries commonly used in the Grafana Explore view:

Fetching the last 100 logs.
Getting the log volume by log level.

The following table illustrates the benchmarked queries:

Query	Last 100 logs	Log volume per log level
Match all logs	[ ]	[x]
Logs with `queen`	[x]	[x]
Logs labeled `region: us-east-2`	[x]	[x]
Logs with label `region: us-east-2` and `queen`	[x]	[x]

Note that the word queen is present in 3% of the logs.

Each query was run 10 times on each engine and we reported the average of each metric in the results section.

Engines Configuration Details

Both engines were configured to use Google Cloud Storage, with all internal caching disabled to ensure a fair comparison.

Loki Configuration

Loki 2.9 was used and configured to optimize for various parameters such as chunk_encoding: snappy and remove ingestion limits.

We used Vector to ingest logs into Loki, we configured it with labels for region and log levels only, creating up to 100 distinct streams.

We tried to configure different sets of labels, but it was hard to control the cardinality of this dataset. Each time we declared fields such as host.name as labels, the cardinality exploded! We even created buckets of hostnames available here and here to limit the cardinality to 1k or 25k label values. However, we found this process impractical and decided to leave the results of these experiments out of this benchmark. Feel free to reach out on Discord if you want to know more about the results, we will certainly publish them too on the GitHub repository.

For reference, we found the following blog posts quite valuable for learning how to configure Loki properly:

Quickwit configuration

Quickwit latest build was used, it contains some performance optimizations that will be available in the next release in June. We used Quickwit default config and the following index config for logs:

version: 0.8
index_id: generated-logs-for-loki
doc_mapping:
  mode: dynamic
  field_mappings:
    - name: timestamp
      type: datetime
      fast: true
      input_formats:
        - rfc3339
    - name: message
      type: text
  timestamp_field: timestamp

Hardware

We used one instance n2-standard-16, the CPU is an Intel Cascade Lake (x86/64) with 16 vCPUs and 64GB of memory.

Benchmark results

Ingestion

The table below details the ingestion metrics, where Quickwit exhibits a slower indexing speed but generates fewer files stored on object storage.

Engine	Quickwit	Loki
Ingestion time (min)	123 (+123%)	55
Mean vCPU	2.2	2.75
Total CPU time (min)	~270 (+80%)	~151
Number of files on GCS	25	145,756 (x5,829)
Bucket size (GiB)	53	55

As expected, Quickwit is slower at indexing and consumes about 80% more CPU resources.

However, Quickwit creates significantly fewer files on object storage. Quickwit ends up with only 25 files against 145k for Loki. Note that querying 145k files will cost you $0.06 on AWS S3 (GET requests cost you $0.0004 per 1000 requests). The low number of files for Quickwit is a result of Quickwit’s merge strategy, which combines smaller files into fewer, larger ones, optimizing both storage efficiency and retrieval performance. This also makes Quickwit consume more CPU.

Finally, Quickwit index size is comparable to Loki, and the cost of storage of both engines will thus be equivalent.

Log queries

The following table showcases the query performance metrics for both engines:

Query	Metric	Quickwit	Loki
`queen`	Latency (s)	0.6s	9.3s (+1,425%)
	CPU Time (s)	2.7s	146s (+5,270%)
	# of GET Requests	206	14,821* (+7,095%)
`us-east-2` (label)	Latency (s)	0.6s	1.0s (+74%)
	CPU Time (s)	2.7s (+35%)	2s
	# of GET Requests	211* (+348%)	47
`us-east-2` (label) and `queen`	Latency (s)	0.6s	0.98s (+59%)
	CPU Time (s)	2.8s	11s (+279%)
	# of GET Requests	255	561* (+120%)

As expected, searching a keyword in the whole dataset is very costly with Loki, both in CPU and in the number of GET requests. Looking at CPU time, searching in a given label in Loki still takes more CPU time than in Quickwit, +435% CPU time compared to +5270% when querying the whole dataset. We can thus see the benefit of choosing relevant labels, which becomes the key parameter to configure in Loki.

When measuring Loki GET requests, we noticed significant variations of GET requests (up to 2x) for the same queries, we suspect there are some caching mechanisms still active, but we did not manage to disable them.

Note that Loki 3.0 recently introduced bloom filters to improve performance on pure “needle in a haystack” queries. However that feature would not help with queen queries. Indeed, the keyword is present in 3% of the logs and will be present with extremely high probability in all the chunks, preventing bloom filters from decreasing the volume of data to scan³.

Log volume by level queries

The following table showcases the log volume query performance metrics for both engines:

Query	Metric	Quickwit	Loki
All dataset	Latency (s)	2.1s	90s (x42)
	CPU Time (s)	22s	1,160s (x51)
	# of GET Requests	88	203,665 (x2,313)
`queen`	Latency (s)	0.4s	565s (x11,423)
	CPU Time (s)	3.2s	8,713s (x12,688)
	# of GET Requests	132	204,622 (x1,549)
`us-east-2` (label)	Latency (s)	0.6s	4.8s (+685%)
	CPU Time (s)	2.8s	40s (x13)
	# of GET Requests	211	6,163 (x28)
`us-east-2` (label) and `queen`	Latency (s)	0.4s	28s (x70)
	CPU Time (s)	2.9s	337s (x115)
	# of GET Requests	176	5,596 (x31)

Quickwit consistently outperforms Loki on the log volume queries: it showcases the benefit of the inverted index combined with the columnar storage. More generally speaking, Loki use a lot of CPU to derive metrics from logs even when labels are correclty used.

Reproducing the benchmark

To ensure transparency and reproducibility, all resources used in this benchmark are available in our benchmarks repository. We detailed how to reproduce the benchmark on the dedicated Loki page. We're happy to receive PRs as we will continue to improve the benchmarks and progressively add other observability engines: Loki 3.0 and OpenSearch are next on the roadmap.

Conclusion

The benchmark reveal that Quickwit, while slower at data ingestion, performs very well in search, especially for analytics-driven queries. This analysis highlights the trade-offs between Quickwit's intensive indexing phase and its efficient querying capabilities. Loki's broader query language enables more diverse use cases, which is a substantial advantage despite its slower query performance on large datasets.

As we continue to enhance Quickwit and Grafana does the same for Loki with new features such as bloom filters and Quickwit expanding its query capabilities, the landscape of log search engines will see exciting developments!

Happy testing!

¹ Humio was founded in 2016 and is famous for its index-free logging platform. It was acquired by Crowdstrike in 2021.

² See the release blog post on scaling indexing and search at petabyte scale.

³ You can do the math yourself: 243,527,673 logs / 145,756 chunks = 1,670 logs per chunk on average; probability of a chunk to not contain the word queen = 1 - (1 - 0.03)^1,670 = 99.99...%.

Benchmarking Quickwit vs. Loki

Establishing a Fair Comparison​

Benchmark setup​

Evaluation Metrics​

Engines Configuration Details​

Loki Configuration​

Quickwit configuration​

Hardware​

Benchmark results​

Ingestion​

Log queries​

Log volume by level queries​

Reproducing the benchmark​

Conclusion​