Spring Boot 3 Observability with Grafana Stack - Orders

In this blog post - Spring Boot 3 Observability with Grafana Stack, we will learn how to implement Observability in our Spring Boot applications using Grafana Stack which comprises Grafana, Loki, and Tempo.

What is Observability?

In a nutshell, Observability is the process of understanding the internal state of the application with the help of different indicators such as Logs, Metrics, and Tracing information.

For a more detailed explanation, have a look at this article.

We will see how to implement Observability for our application built with Spring Boot 3 using the Grafana Stack.

In our application, numerous requests pass through the API Gateway, interacting with services like OrderService and InventoryService. However, when a failure occurs—such as the InventoryService failing—it becomes challenging to pinpoint the root cause or identify which request triggered the failure.

To address this issue, we implemented Distributed Tracing.

In the above image, a user sends a request to OrderService, which goes through the API Gateway. If InventoryService fails, we need to trace the request from start to finish to find the issue.

To solve this, we use:

  • Trace IDs: A unique ID shared across all services in the request flow, helping us track where the problem occurred.

  • Span IDs: Unique to each service, showing the specific work done by that service.

Using Trace IDs and Span IDs, we can easily identify where a request is failing and debug faster in a distributed system.

Grafana Stack

Grafana Stack comprises about 3 softwares:

  • Grafana: This is the most widely used tool that helps to monitor and visualize the metrics of our application. Users can visualize the metrics by building different dashboards and can use different kinds of charts to visualize the metrics. We can also configure alerts to be notified whenever a metric reaches a certain required threshold.

    To collect metrics, we will be using Prometheus, a metrics aggregation tool.

  • Loki: is a Log Aggregation tool that receives the logs from our application and indexes the logs to be visualized using Grafana.

  • Tempo: is used as a distributed tracing tool, which can track requests that span across different systems.

Implementing Observability

Logging

Let's start with implementing logging in our application. To send our application logs to Loki, we have to add the below dependency to the pom.xml of product service.

   <dependency>
        <groupId>com.github.loki4j</groupId>
        <artifactId>loki-logback-appender</artifactId>
        <version>1.3.2</version>
    </dependency>

The loki-logback-appender adds the necessary integration with Loki with the help of the Logback logging library.

Next, we have to define a logback-spring.xml file inside the src/main/resources which contains necessary information about how to structure our logs and where to send the logs (in other words it contains the information about Loki URL).

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <include resource="org/springframework/boot/logging/logback/base.xml"/>
    <springProperty scope="context" name="appName" source="spring.application.name"/>

    <appender name="LOKI" class="com.github.loki4j.logback.Loki4jAppender">
        <http>
            <url>http://localhost:3100/loki/api/v1/push</url>
        </http>
        <format>
            <label>
                <pattern>application=${appName},host=${HOSTNAME},level=%level</pattern>
            </label>
            <message>
                <pattern>${FILE_LOG_PATTERN}</pattern>
            </message>
            <sortByTime>true</sortByTime>
        </format>
    </appender>

    <root level="INFO">
        <appender-ref ref="LOKI"/>
    </root>
</configuration>

The <appender> defines the Loki4JAppender, which contains the reference to the Loki url under the <url> tag. It also defines the log pattern using the <pattern> tag which is defined as application=${app.name}, host=${HOSTNAME}, level=%level, where we display the application name which is defined in the <springProperty> tag, host, and the log level, which is defined as INFO under the <root> tag.

That's all we need to do to implement logging using Loki. You can download and run Loki on your machine using Docker. In the sample project, I am using docker-compose, add the below Loki configuration in the docker-compose.yml file in api-gateway project. because it should not restrict to one service.

loki:
  image: grafana/loki:main
  command: ['-config.file=/etc/loki/local-config.yaml']
  ports:
    - '3100:3100'

Now let's see how to implement Metrics using Prometheus and Grafana.

Metrics

Metrics can be any kind of measurable information about our application like JVM statistics, Thread Count, Heap Memory information, etc. To collect metrics of our application, we need to first enable Spring Boot Actuator in our project by adding the below dependency:

<dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-actuator</artifactId>
        </dependency>

Next, we have to add another dependency to expose the metrics of our application, Spring Boot uses Micrometer to collect metrics, and by adding the below dependency we can configure Micrometer to expose an endpoint that can be scraped by Prometheus.

    <dependency>
            <groupId>io.micrometer</groupId>
            <artifactId>micrometer-registry-prometheus</artifactId>
            <scope>runtime</scope>
        </dependency>

To see different metrics exposed by Spring Boot you can refer to this link from Spring Boot documentation - docs.spring.io/spring-boot/docs/current/ref..

The next step is to add some properties to our application.properties file.

management.endpoints.web.exposure.include=health, info, metrics, prometheus
management.metrics.distribution.percentiles-histogram.http.server.requests=true
management.observations.key-values.application=product-service

The property - management.endpoints.web.exposure.include=health, info, metrics, prometheus exposes the endpoints health, info, metrics, and prometheus through the actuator.

Next, we are defining a property called management.metrics.distribution.percentiles-histogram.http.server.requests=true which is used by the micrometer to gather the metrics in the form of a histogram and send it to Prometheus. You can read more about this concept here - micrometer.io/docs/concepts#_histograms_and..

After adding the above properties run both applications and open the URL - localhost:8080/actuator/prometheus to see different metrics that are exposed by the micrometer.

You can run Prometheus by adding the below entry in the docker-compose.yml file in api gateway application

prometheus:
  image: prom/prometheus:v2.46.0
  command:
    - --enable-feature=exemplar-storage
    - --config.file=/etc/prometheus/prometheus.yml
  volumes:
    - ./docker/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
  ports:
    - '9090:9090'

We need a configuration file, to tell Prometheus where it can find the necessary metrics to scrape. For that, we need to create a file called prometheus.yml under docker in api gateway with the following content.

global:
  scrape_interval: 2s
  evaluation_interval: 2s

scrape_configs:
  - job_name: 'api-gateway'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: [ 'host.docker.internal:9000' ] ## only for demo purposes don't use host.docker.internal in production
  - job_name: 'product-service'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['host.docker.internal:8080'] ## only for demo purposes don't use host.docker.internal in production
        labels:
          application: 'Product Service'
  - job_name: 'order-service'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['host.docker.internal:8083'] ## only for demo purposes don't use host.docker.internal in production
        labels:
          application: 'Order Service'
  - job_name: 'inventory-service'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: [ 'host.docker.internal:8084' ] ## only for demo purposes don't use host.docker.internal in production
        labels:
          application: 'Inventory Service'
  - job_name: 'Notification_Service'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: [ 'host.docker.internal:8082' ] ## only for demo purposes don't use host.docker.internal in production
        labels:
          application: 'Notification Service'

Under the global field, we defined the scrape and evaluation interval as 2s. In the scrape_configs section, we have different jobs, one for prometheus, product service**,** and order service, inventory service and Notification Service. Notice that to scrape all the services, we defined the URL of all the services and the metrics path as - /actuator/prometheus

Tracing

Now let's go ahead and implement Distributed Tracing using Tempo. For that, we need to add some more dependencies.

Prior to Spring Boot 3, we used to add the Spring Cloud Sleuth dependency to add distributed tracing capabilities to our application, but from Spring Boot 3, Spring Cloud Sleuth is no longer needed and this is replaced by the Micrometer Tracing Project. To add the support, add the below dependencies in product service:

    <dependency>
            <groupId>io.micrometer</groupId>
            <artifactId>micrometer-tracing-bridge-brave</artifactId>
        </dependency>
        <dependency>
            <groupId>io.zipkin.reporter2</groupId>
            <artifactId>zipkin-reporter-brave</artifactId>
        </dependency>

micrometer-tracing-bridge-brave is the dependency that does all the magic and adds distributed tracing for our application. Whereas zipkin-reporter-brave will exportthe tracing information to Tempo.

NOTE: You can also use other tracing implementation like OpenTelemetry - micrometer-tracing-bridge-otel dependency instead of Brave - micrometer-tracing-bridge-brave

If you want to trace the calls to the database, as we are using Spring Data JDBC, we can add the dependency datasource-micrometer-spring-boot dependency.

    <dependency>
            <groupId>net.ttddyy.observation</groupId>
            <artifactId>datasource-micrometer-spring-boot</artifactId>
            <version>1.0.1</version>
        </dependency>

Next, we need to define a bean of type ObservedAspect we can do that by creating a class called ObservationConfig.java

package com.lakshmiTech.product_service.config;

import io.micrometer.observation.ObservationRegistry;
import io.micrometer.observation.aop.ObservedAspect;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class ObservationConfig {
    @Bean
    ObservedAspect observedAspect(ObservationRegistry registry) {
        return new ObservedAspect(registry);
    }
}

Finally, to enable the Aspect Oriented Programming, we need to add the spring-boot-starter-aop dependency.

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-aop</artifactId>
        </dependency>

Micrometer Tracing will only send 10% of the traces it generates to Tempo, just to avoid overwhelming it with a lot of requests. We can set it to 100% by adding the below property to our application.yml file

management.tracing.sampling.probability=1.0

Finally, you can run Tempo using docker, by adding the below piece of code inside the docker-compose.yml file in api-gateway:

tempo:
  image: grafana/tempo:2.2.2
  command: ['-config.file=/etc/tempo.yaml']
  volumes:
    - ./docker/tempo/tempo.yml:/etc/tempo.yaml:ro
    - ./docker/tempo/tempo-data:/tmp/tempo
  ports:
    - '3110:3100' # Tempo
    - '9411:9411' # zipkin

Finally, we need to configure a file called tempo.yml file to store the necessary settings to be used in Tempo. I created this file under the docker folder

server:
  http_listen_port: 3200

distributor:
  receivers:
    zipkin:

storage:
  trace:
    backend: local
    local:
      path: /tmp/tempo/blocks

You can observe that we are referring to this file inside the docker-compose service, and we are mounting this file into the /etc/ location of the container.

Running Grafana

Before testing our implementation, let's also see how to run Grafana using Docker. After all, this is what brings all the services like Tempo, Loki, and Prometheus together and visualizes the information produced by our services.

grafana:
  image: grafana/grafana:10.1.0
  volumes:
    - ./docker/grafana:/etc/grafana/provisioning/datasources:ro
  environment:
    - GF_AUTH_ANONYMOUS_ENABLED=true
    - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
    - GF_AUTH_DISABLE_LOGIN_FORM=true
  ports:
    - '3000:3000'

The above configuration will run Grafana by disabling the login and authentication, do not use this configuration in Production.

Also for Grafana, we need to define the data sources from which it needs to gather the information to visualize, for that let's create a file called datasources.yml

apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    editable: false
    jsonData:
      httpMethod: POST
      exemplarTraceIdDestinations:
        - name: trace_id
          datasourceUid: tempo
  - name: Tempo
    type: tempo
    access: proxy
    orgId: 1
    url: http://tempo:3200
    basicAuth: false
    isDefault: true
    version: 1
    editable: false
    apiVersion: 1
    uid: tempo
    jsonData:
      httpMethod: GET
      tracesToLogs:
        datasourceUid: 'loki'
      nodeGraph:
        enabled: true
  - name: Loki
    type: loki
    uid: loki
    access: proxy
    orgId: 1
    url: http://loki:3100
    basicAuth: false
    isDefault: false
    version: 1
    editable: false
    apiVersion: 1
    jsonData:
      derivedFields:
        - datasourceUid: tempo
            matcherRegex: \[.+,(.+?),
            name: TraceID
            url: $${__value.raw}

This file defines all the data sources like Prometheus, Loki, and Tempo and references to the respective URLs.

Testing

Okay, now it's Testing Time.

Start all the services by running the command:

docker compose up -d

Also, run all other services.

After you make some calls to GET/product and POST/product, let's first open Loki and check for logs.

  • Open the URL - localhost:3000

  • Click on the toggle menu and click on 'Explore'

  • Under the dropdown select - 'Loki' and run the query with your desired parameters, e.g.: select the application label as - product service.

now u can able to see logs for order-service as below.

After this, we have to create dashboard. for that need to add dashboard.json file under docker→grafana in api-gateway project. json data will be available in my git hub code.

after adding json file we can import in grafana. then we can see grafana dashboard for our application product-service

Likewise you have to add Observability to all other services in same way. and u can see below screenshots.

Inventory service:

Order service:

To implement Observability we have to create config package in order service and add below code.

package com.lakshmiTech.microservices.order.config;

import io.micrometer.observation.ObservationRegistry;
import io.micrometer.observation.aop.ObservedAspect;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class ObservationConfig {

    @Bean
    ObservedAspect observedAspect(ObservationRegistry registry) {
        return new ObservedAspect(registry);
    }
}

complete pom.xml file:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
       <groupId>org.springframework.boot</groupId>
       <artifactId>spring-boot-starter-parent</artifactId>
       <version>3.4.0</version>
       <relativePath/> <!-- lookup parent from repository -->
    </parent>
    <groupId>com.lakshmiTech.miceroservices.order</groupId>
    <artifactId>order_service</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>order_service</name>
    <description>Demo project for Spring Boot</description>
    <url/>
    <licenses>
       <license/>
    </licenses>
    <developers>
       <developer/>
    </developers>
    <scm>
       <connection/>
       <developerConnection/>
       <tag/>
       <url/>
    </scm>
    <properties>
       <java.version>21</java.version>
       <spring-cloud.version>2024.0.0</spring-cloud.version>
    </properties>
    <repositories>
       <repository>
          <id>confluent</id>
          <url>https://packages.confluent.io/maven/</url>
       </repository>
       <repository>
          <id>maven-central</id>
          <url>https://repo.maven.apache.org/maven2</url>
       </repository>
    </repositories>

    <dependencies>
       <dependency>
          <groupId>org.springframework.boot</groupId>
          <artifactId>spring-boot-starter-data-jpa</artifactId>
       </dependency>
       <dependency>
          <groupId>org.springframework.boot</groupId>
          <artifactId>spring-boot-starter-jdbc</artifactId>
          <version>3.4.0</version>
          <scope>compile</scope>
       </dependency>
       <dependency>
          <groupId>org.springframework.boot</groupId>
          <artifactId>spring-boot-starter-web</artifactId>
       </dependency>

       <dependency>
          <groupId>org.flywaydb</groupId>
          <artifactId>flyway-core</artifactId>
       </dependency>
       <dependency>
          <groupId>org.flywaydb</groupId>
          <artifactId>flyway-database-postgresql</artifactId>
       </dependency>

       <dependency>
          <groupId>org.postgresql</groupId>
          <artifactId>postgresql</artifactId>
          <scope>runtime</scope>
       </dependency>
       <dependency>
          <groupId>org.projectlombok</groupId>
          <artifactId>lombok</artifactId>
          <optional>true</optional>
       </dependency>
       <!-- https://mvnrepository.com/artifact/org.springframework.cloud/spring-cloud-starter-circuitbreaker-resilience4j -->
       <dependency>
          <groupId>org.springframework.cloud</groupId>
          <artifactId>spring-cloud-starter-circuitbreaker-resilience4j</artifactId>
          <version>3.1.2</version>
       </dependency>
       <!-- https://mvnrepository.com/artifact/org.apache.groovy/groovy -->
       <dependency>
          <groupId>org.apache.groovy</groupId>
          <artifactId>groovy</artifactId>
          <version>5.0.0-alpha-10</version>
       </dependency>

       <!-- https://mvnrepository.com/artifact/org.springdoc/springdoc-openapi-starter-webmvc-ui -->
       <dependency>
          <groupId>org.springdoc</groupId>
          <artifactId>springdoc-openapi-starter-webmvc-ui</artifactId>
          <version>2.6.0</version>
       </dependency>

       <dependency>
          <groupId>org.springdoc</groupId>
          <artifactId>springdoc-openapi-starter-webmvc-api</artifactId>
          <version>2.6.0</version>
       </dependency>


       <dependency>
          <groupId>org.springframework.boot</groupId>
          <artifactId>spring-boot-starter-actuator</artifactId>
       </dependency>
       <dependency>
          <groupId>com.github.loki4j</groupId>
          <artifactId>loki-logback-appender</artifactId>
          <version>1.3.2</version>
       </dependency>
       <dependency>
          <groupId>io.micrometer</groupId>
          <artifactId>micrometer-tracing-bridge-brave</artifactId>
       </dependency>
       <dependency>
          <groupId>io.zipkin.reporter2</groupId>
          <artifactId>zipkin-reporter-brave</artifactId>
       </dependency>
       <dependency>
          <groupId>org.springframework.boot</groupId>
          <artifactId>spring-boot-starter-aop</artifactId>
       </dependency>
       <dependency>
          <groupId>io.micrometer</groupId>
          <artifactId>micrometer-registry-prometheus</artifactId>
          <scope>runtime</scope>
       </dependency>
       <dependency>
          <groupId>org.apache.avro</groupId>
          <artifactId>avro</artifactId>
          <version>1.11.3</version>
       </dependency>
       <dependency>
          <groupId>io.confluent</groupId>
          <artifactId>kafka-schema-registry-client</artifactId>
          <version>7.6.0</version>
       </dependency>
       <dependency>
          <groupId>io.confluent</groupId>
          <artifactId>kafka-avro-serializer</artifactId>
          <version>7.6.0</version>
       </dependency>
       <dependency>
          <groupId>org.springframework</groupId>
          <artifactId>spring-web</artifactId>
       </dependency>


       <dependency>
          <groupId>org.springframework.kafka</groupId>
          <artifactId>spring-kafka</artifactId>
       </dependency>

       <dependency>
          <groupId>org.springframework.boot</groupId>
          <artifactId>spring-boot-starter-test</artifactId>
          <scope>test</scope>
       </dependency>
       <dependency>
          <groupId>org.springframework.cloud</groupId>
          <artifactId>spring-cloud-starter-contract-stub-runner</artifactId>
          <scope>test</scope>
       </dependency>
       <dependency>
          <groupId>org.springframework.kafka</groupId>
          <artifactId>spring-kafka-test</artifactId>
          <scope>test</scope>
       </dependency>


       <dependency>
          <groupId>org.springframework.boot</groupId>
          <artifactId>spring-boot-testcontainers</artifactId>
          <scope>test</scope>
       </dependency>
       <dependency>
          <groupId>org.testcontainers</groupId>
          <artifactId>junit-jupiter</artifactId>
          <scope>test</scope>
       </dependency>
    </dependencies>

    <dependencyManagement>
       <dependencies>
          <dependency>
             <groupId>org.springframework.cloud</groupId>
             <artifactId>spring-cloud-dependencies</artifactId>
             <version>${spring-cloud.version}</version>
             <type>pom</type>
             <scope>import</scope>
          </dependency>
       </dependencies>
    </dependencyManagement>

    <build>
       <plugins>
          <plugin>
             <groupId>org.apache.maven.plugins</groupId>
             <artifactId>maven-compiler-plugin</artifactId>
             <configuration>
                <annotationProcessorPaths>
                   <path>
                      <groupId>org.projectlombok</groupId>
                      <artifactId>lombok</artifactId>
                   </path>
                </annotationProcessorPaths>
             </configuration>
          </plugin>
          <plugin>
             <groupId>org.springframework.boot</groupId>
             <artifactId>spring-boot-maven-plugin</artifactId>
             <configuration>
                <excludes>
                   <exclude>
                      <groupId>org.projectlombok</groupId>
                      <artifactId>lombok</artifactId>
                   </exclude>
                </excludes>
             </configuration>
          </plugin>
          <plugin>
             <groupId>org.apache.avro</groupId>
             <artifactId>avro-maven-plugin</artifactId>
             <executions>
                <execution>
                   <id>schemas</id>
                   <phase>generate-sources</phase>
                   <goals>
                      <goal>schema</goal>
                   </goals>
                   <configuration>
                      <sourceDirectory>${project.basedir}/src/main/resources/avro</sourceDirectory>
                      <outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
                   </configuration>
                </execution>
             </executions>
          </plugin>
       </plugins>
    </build>

</project>

And we can import all client data from keyCloack. it will download as a JSON file. and then we can add that file under keyClock folder in api-gateway project. because if we needs to do restart of keycloak, then we can add those client details through this realm-export.json file.

To check trace ID, we have to go to tempo in grafana and check for particular service,

Now let's open Prometheus, and apply the same filter, you should see the results below:

After getting trace ID, we can open trace ID and see

Conclusion

Observability plays a vital role in ensuring that our applications are running as expected and provides us insights into the inner state of the application.

You can find the complete source code of this application on Github - https://github.com/Malalakshmi/Microservices_Application_Main.git