Observe your Rust application with Quickwit, Jaeger and Grafana

Rust app tracing with Quickwit and Jaeger.

You may have already come across Lucas Palmieri's blog post Are we observable yet? An introduction to Rust telemetry. If you haven't, we recommend giving it a read, as it provides a comprehensive introduction to handling logs in Rust code.

However, logs alone may not be sufficient, particularly in a distributed architecture. This is where distributed tracing comes into play. At Quickwit, we use traces a lot to understand performance bottlenecks and enhance speed. When faced with slow search responses from Quickwit, we often ask ourselves: What caused the slowdown? Was it network-related, disk I/O, or excessive CPU usage?

In this blog post, we will show how to instrument your application and generate traces from a Rust application, exploiting them from a DevOps perspective. Our objectives will be twofold:

Analyze the traces using the widely recognized Jaeger UI to gain insights into the application's behavior.
Derive RED (Rate, Errors, and Duration) metrics from these traces and monitor them in Grafana. We recommend the following resources if you want to dig further: The RED Method from Weaveworks and the Google SRE book on Monitoring Distributed Systems.

Now, let's delve into the walkthrough, where we will cover the following key aspects:

Instrumenting a simple web API built with Actix.
Pushing your traces and metrics data into Quickwit.
Detecting, diagnosing, and resolving issues using Jaeger UI.
Monitoring the RED metrics (Rate, Errors, Duration) on your application in Grafana.

info

Please, find the source code for this walkthrough here.

Requirements:

Before we dive into the details, please ensure that you have the following software installed and running correctly on your system:

Rust 1.68+
Docker
docker-compose if you're still running and old version of docker.

Building and Instrumenting the Rust Application

We will create a basic Rust application using the Actix web framework. This application is a web API with a single endpoint. It will fetch posts and their comments from the popular JSONPlaceholder public web API and display them as JSON. To better understand our application life cycle and possibly optimize it, we will make sure to instruments the following routines:

Fetch posts from /posts.
Fetching the comments of each post /posts/1/comments

Project initialization and core dependencies

Make a new directory called rust-app-tracing. Navigate to it in your terminal, and run the following commands to initialize a new Rust project.

cargo new web-api

Let's also ensure we have the dependencies needed in our web-api/Cargo.toml file.

actix-web: A fast web framework for building web applications in Rust.
actix-web-opentelemetry: The actix-web framework open-telemetry extension.
opentelemetry: The core open-telemetry SDK for Rust, including tracing and metrics.
opentelemetry-otlp: A crate providing various open-telemetry exporters.
reqwest: A crate providing an intuitive API to make HTTP requests.
tokio: Provides an async runtime for our application.

The web API application code

First, let's configure the application tracing by creating a file called telemetry.rs in which we will handle all the tracing configurations.

// telemetry.rs

...

const SERVICE_NAME: &'static str = "quickwit-jaeger-demo";

pub fn init_telemetry(exporter_endpoint: &str) {
    // Create a gRPC exporter
    let exporter = opentelemetry_otlp::new_exporter()
        .tonic()
        .with_endpoint(exporter_endpoint);

    // Define a tracer
    let tracer = opentelemetry_otlp::new_pipeline()
        .tracing()
        .with_exporter(exporter)
        .with_trace_config(
            trace::config().with_resource(Resource::new(vec![KeyValue::new(
                opentelemetry_semantic_conventions::resource::SERVICE_NAME,
                SERVICE_NAME.to_string(),
            )])),
        )
        .install_batch(opentelemetry::runtime::Tokio)
        .expect("Error: Failed to initialize the tracer.");

    // Define a subscriber.
    let subscriber = Registry::default();
    // Level filter layer to filter traces based on level (trace, debug, info, warn, error).
    let level_filter_layer = EnvFilter::try_from_default_env().unwrap_or(EnvFilter::new("INFO"));
    // Layer for adding our configured tracer.
    let tracing_layer = tracing_opentelemetry::layer().with_tracer(tracer);
    // Layer for printing spans to stdout
    let formatting_layer = BunyanFormattingLayer::new(
        SERVICE_NAME.to_string(),
        std::io::stdout,
    );

    global::set_text_map_propagator(TraceContextPropagator::new());

    subscriber
        .with(level_filter_layer)
        .with(tracing_layer)
        .with(JsonStorageLayer)
        .with(formatting_layer)
        .init()
}

Next, let’s implement our API endpoint while adding some instrumentations to the handler functions. It is important to note that our focus is not on what this application does but on generating meaningful and exploitable tracing data from the application.

We first have models files allowing us to serialize and deserialize post and comment.

//models.rs
...

#[derive(Debug, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct Post {
    pub user_id: i64,
    pub id: i64,
    pub title: String,
    pub body: String,
    #[serde(default)]
    pub comments: Vec<Comment>,
}

#[derive(Debug, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct Comment {
    pub post_id: i64,
    pub id: i64,
    pub name: String,
    pub email: String,
    pub body: String,
}

Next, let's work on the API endpoint handlers. Take note of the instrument attribute decorating some functions. That's how we enable tracing on handler functions and subsequent functions it uses to perform its task.

// lib.rs

...

const BASE_API_URL: &'static str = "https://jsonplaceholder.typicode.com";

// The get_post handler
#[instrument(level = "info", name = "get_posts", skip_all)]
#[get("")]
async fn get_posts() -> Result<HttpResponse, Error> {
    // Randomly simulate errors in request handling
    let choices = [200, 400, 401, 200, 500, 501, 200, 500];
    let mut rng = rand::thread_rng();
    let choice = choices.choose(&mut rng)
        .unwrap()
        .clone();
    match choice {
        400..=401 => Ok(HttpResponse::new(StatusCode::from_u16(choice).unwrap())),
        500..=501 => Ok(HttpResponse::new(StatusCode::from_u16(choice).unwrap())),
        _ => {
            let posts = fetch_posts(20)
                .await
                .map_err(actix_web::error::ErrorInternalServerError)?;
            Ok(HttpResponse::Ok().json(posts))
        }
    }
}

// Fetching posts with a limit.
#[instrument(level = "info", name = "fetch_posts")]
async fn fetch_posts(limit: usize) -> anyhow::Result<Vec<Post>> {
    let client = Client::new();
    let url = format!("{}/posts", BASE_API_URL);
    let mut posts: Vec<Post> = request_url(&client, &url).await?;
    posts.truncate(limit);
    let post_idx_to_ids: Vec<(usize, i64)> = posts.iter().enumerate().map(|(idx, post)| (idx, post.id)).collect();

    // fetch post comments one after another.
    for (index, post_id) in post_idx_to_ids {
        let comments = fetch_comments(&client, post_id).await?;
        posts[index].comments = comments
    }

    Ok(posts)
}

...

In the above snippet, we are only sending traces. It's also possible to use capable log collectors to collect logs and send them in Quickwit or other backends.

Using Quickwit to ingest tracing data

Now that we have done building our application. Let’s run it alongside Quickwit and ensure the generated traces get indexed by Quickwit.

Unlike what we did in a previous blog post, we will create a docker-compose file to streamline the setup between Quickwit, Jaeger, and Grafana. The following docker-compose file contains all the necessary configurations.

QW_ENABLE_OTLP_ENDPOINT: Allow Quickwit to accept and ingest tracing and log data.
SPAN_STORAGE_TYPE, GRPC_STORAGE_SERVER, QW_ENABLE_JAEGER_ENDPOINT: Allow Jaeger to pull traces and logs from Quickwit for analytic purposes.
GF_PLUGINS_ALLOW_LOADING_UNSIGNED_PLUGINS: Allows us to load specific plugins in Grafana.

# docker-compose.yaml
version: '3'
services:
  quickwit:
    image: quickwit/quickwit:latest
    command: run
    restart: always
    environment:
      QW_ENABLE_OTLP_ENDPOINT: true 
      QW_ENABLE_JAEGER_ENDPOINT: true 
    ports:
      - '7280:7280'
      - '7281:7281'
    volumes:
      - ./qwdata:/quickwit/qwdata

  jaeger:
    image: jaegertracing/jaeger-query:latest
    restart: always
    depends_on:
      - quickwit
    environment:
      SPAN_STORAGE_TYPE: 'grpc-plugin'
      GRPC_STORAGE_SERVER: 'quickwit:7281'
    ports:
      - '16686:16686'

  grafana:
    image: grafana/grafana-enterprise:latest
    restart: always
    user: root
    depends_on:
      - quickwit
    environment:
      GF_PLUGINS_ALLOW_LOADING_UNSIGNED_PLUGINS: 'quickwit-quickwit-datasource'
    ports:
      - '3000:3000'
    volumes:
      - ./grafana-storage:/var/lib/grafana

With this docker-compose file in our project directory, let's create the directories needed for the services to run correctly. Create the qwdata directory for storing Quickwit data.

mkdir qwdata

Then, download and place the Quickwit Grafana data source plugin in the expected location.

wget https://github.com/quickwit-oss/quickwit-datasource/releases/download/v0.2.0/quickwit-quickwit-datasource-0.2.0.zip \
&& mkdir -p grafana-storage/plugins \
&& unzip quickwit-quickwit-datasource-0.2.0.zip -d grafana-storage/plugins

Now let's start all the services (Quickwit, Jaeger, Grafana) by running:

docker compose up -d

Considering there was no issue, let's now run our web application and hit http://localhost:9000/post endpoint a few times with cURL, to generate some traces.

 cargo run

curl -X GET http://localhost:9000/post

Wait around 10 seconds before the new traces are indexed and available for search.

You can now check that Quickwit has indexed the tracing data by searching the otel-traces-v0_6 index via cURL.

curl -X POST http://localhost:7280/api/v1/otel-traces-v0_6/search -H 'Content-Type: application/json' -d '{ "query": "service_name:quickwit-jaeger-demo" }'

You can also view the data using Quickwit UI http://localhost:7280/ui/search.

Quickwit UI

Analyzing traces with Jaeger UI.

With the Jaeger container already running, navigate to http://localhost:16686 to visualize our application traces.

Jaeger UI

As you can see from the above screenshot, we are sequentially fetching comments for each post. That is, we make twenty requests one after the other. That makes the overall request handling time longer (4.39s above).

But can we do better?

With all the great tools at Rust developer's disposal, the answer is an obvious "Yes!". Let's fetch the comments concurrently by taking advantage of Tokio and the async stream feature of Rust futures crate.

Let's update our fetch_posts function to run the requests concurrently in batches of ten requests at a time. That should speed things up a bit more.

// Fetching posts with a limit.
#[instrument(level = "info", name = "fetch_posts")]
async fn fetch_posts(limit: usize) -> anyhow::Result<Vec<Post>> {
  ...

  // fetch post comments concurrently.
  let tasks: Vec<_> = post_idx_to_ids
      .into_iter()
      .map(|(index, post_id)| {
          let moved_client = client.clone();
          async move {
              let comments_fetch_result = fetch_comments(&moved_client, post_id).await;
              (index, comments_fetch_result)
          }
      })
      .collect();
  let mut stream = futures::stream::iter(tasks)
      .buffer_unordered(10); // batch of 10 request at a time
  while let Some((index, comments_fetch_result)) = stream.next().await {
      let comments = comments_fetch_result?;
      posts[index].comments = comments;
  }

  ...
}

With this change, you will notice we are now handling the requests in about 2.46s, and you can also visually see that we always have at most ten fetch_comments requests running concurrently during the span of our request handler.

Jaeger UI

Monitoring application metrics with Grafana

Jaeger is good for focused inspection on a single trace. But what if we want to monitor the latencies of our service? What if we want to count the number of errors or request with a given trace metadata?

This is where Grafana dashboards comes in. We want to build the RED metrics from our traces and visualize them in Grafana.

Navigate to http://localhost:3000/login, use admin as username and password to login.

Once logged in, we can connect to Quickwit and query our application traces by using the newly released Quickwit data source plugin.

Grafana Quickwit Data source definition

To make the process of monitoring RED metrics more convenient, we have prepared a pre-configured Grafana dashboard for you to download and import into your Grafana instance.

This dashboard serves as a powerful tool for visualizing and understanding the performance. It includes three panels:

The first panel displays the number of requests per minute.
The second panel shows the number of errors per minute.
The third panel presents the duration percentiles of requests per minute.

To observe the metrics in action, you can utilize an HTTP benchmark tool or even use this script provided in the tutorial to send multiple concurrent requests to your Rust application.

Now let's look at a screenshot of the Grafana dashboard, showcasing the metrics after running the script.

Grafana Application Metrics

Conclusion

That's it! In this blog post, we went beyond basic logging to dive into distributed tracing and how to use it to monitor application performance.

Our experience building Quickwit showed us how distributed tracing was important to understand where we were losing time because of calls to S3 or reading data on a local disk. We hope it will help you too :)

Happy coding and happy observing!

Observe your Rust application with Quickwit, Jaeger and Grafana

Building and Instrumenting the Rust Application​

Project initialization and core dependencies​

The web API application code​

Using Quickwit to ingest tracing data​

Analyzing traces with Jaeger UI.​

Monitoring application metrics with Grafana​

Conclusion​

Further reading​