Business Metrics
Observability is not only about incidents. You can also define your own metrics.
OpenTelemetry API provides an API to create your
- Counters: value which can only increment ( such as the number of processed requests),
- Gauges: represents the current value (such as the speed gauge of a car),
- Histograms: to record values with large distribution such as latencies.
Let’s go back to our code!
Objectives
We want to add new metrics to the easypay service to measure they payment processing and store time.
So we target a metric of Histogram type.
In order to achieve this goal, we will measure the time spent in the two methods process and store of the
com.worldline.easypay.payment.control.PaymentService class of the easypay-service module.
This class is the central component responsible for processing payments: it provides the accept public method, which
delegates its responsibility to two private ones:
process: which does all the processing of the payment: validation, calling third parties…store: to save the processing result in database.
We also want to count the number of payment requests processed by our system. We will use a metric of Counter type.
1. Add the opentelemetry-api dependency
We need to add the opentelemetry-api dependency to easypay-service in order to use the OpenTelemetry API to create
custom metrics.
📝 Add the following dependency to the easypay-service build.gradle.kts file:
dependencies {
// ...
implementation("io.opentelemetry:opentelemetry-api")
}2. Declare the histogram
We need to declare two timers in our code:
processTimerto record theeasypay.payment.processmetric: it represents the payment processing time and record the time spent in theprocessmethod,storeTimerto record theeasypay.payment.storemetric: it represents the time required to store a payment in database by recording the time spent in thestoremethod.
📝 Let’s modify the com.worldline.easypay.payment.control.PaymentService class to declare them:
// ...
import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.metrics.LongHistogram;
import io.opentelemetry.api.GlobalOpenTelemetry;
@Service
public class PaymentService {
// ...
private LongHistogram processHistogram; // (1)
private LongHistogram storeHistogram;
public PaymentService(/* ... */) {
// ...
OpenTelemetry openTelemetry = GlobalOpenTelemetry.get(); // (2)
processHistogram = openTelemetry.getMeter(EasypayServiceApplication.class.getName()) //(3)
.histogramBuilder("easypay.payment.process") // (4)
.setDescription("Payment processing time") // (5)
.setUnit("ms") // (6)
.ofLongs() // (7)
.build();
storeHistogram = openTelemetry.getMeter(EasypayServiceApplication.class.getName())
.histogramBuilder("easypay.payment.store")
.setDescription("Payment storing time")
.setUnit("ms")
.ofLongs()
.build();
}
}- Declare the two timers,
- Inject the OpenTelemetry instance to get the Meter object to create the histograms,
- Initialize the two histograms by giving them a name (4), a description (5), a unit (6) and setting the type of the values (7).
3. Record time spent in the methods
📝 Let’s modify our process and store methods to record our latency with the new metrics.
We can simply wrap our original code in a try-finally construct such as:
// ...
private void process(PaymentProcessingContext context) {
long startTime = System.currentTimeMillis(); // (1)
try { // (2)
if (!posValidator.isActive(context.posId)) {
context.responseCode = PaymentResponseCode.INACTIVE_POS;
return;
}
// ...
} finally {
long duration = System.currentTimeMillis() - startTime; // (3)
processHistogram.record(duration); // (4)
}
}
private void store(PaymentProcessingContext context) {
long startTime = System.currentTimeMillis(); // (5)
try {
Payment payment = new Payment();
// ...
} finally {
long duration = System.currentTimeMillis() - startTime;
storeHistogram.record(duration);
}
}- Get the start time,
- Wrap the original code in a
try-finallyconstruct, - Compute the duration,
- Record the duration in the
processHistogramhistogram, - Do the same for the
storemethod.
4. Add counter
📝 Let’s do the same for the counter:
// ...
import io.opentelemetry.api.metrics.LongCounter;
import io.opentelemetry.api.metrics.LongHistogram;
@Service
public class PaymentService {
//...
private LongCounter requestCounter; // (1)
public PaymentService(/* ... */) {
// ...
requestCounter = openTelemetry.getMeter(EasypayServiceApplication.class.getName()) // (2)
.counterBuilder("easypay.payment.requests")
.setDescription("Payment requests counter")
.build();
}
}- Declares the counter,
- Initializes the counter.
📝 The method accept of the PaymentService class is invoked for each payment request, it is a good candidate to
increment our counter:
@Transactional(Transactional.TxType.REQUIRED)
public void accept(PaymentProcessingContext paymentContext) {
requestCounter.add(1); // < Add this (1)
process(paymentContext);
store(paymentContext);
paymentTracker.track(paymentContext);
}- Increment the counter each time the method is invoked.
5. Redeploy easypay
🛠️ Rebuild and redeploy easypay-service:
$ docker compose up -d --build easypay-service🛠️ Once easypay is started (you can check logs with the docker compose logs -f easypay-service command and wait for
an output like Started EasypayServiceApplication in 32.271 seconds):
- Execute some queries:
$ k6 run -u 1 -d 1m k6/01-payment-only.js🛠️ Then go to Grafana and explore Metrics to find your newly created metrics:
- Search for metric with base name
easypay_payment_process, - 👀 You should get 3 new metrics:
easypay_payment_process_milliseconds_bucket,easypay_payment_process_milliseconds_count,easypay_payment_process_milliseconds_sum.
👀 Explore them, especially the _bucket one.
When using a Histogram you get several metrics by default, suffixed with:
_bucket: contains the number of event which lasts less than the value defined in theletag,_count: the number of hits,_sum: the sum of time spent in the method.
Especially:
- We can get the average time spent in the method by dividing the
sumby thecount, - We can calculate the latency percentile thanks to the buckets.
Finally, our Counter becomes a metric suffixed with _total: easypay_payment_requests_total.
6. Compute percentiles
Let’s compute percentiles for the process and store methods.
As we have seen, the Histogram metric provides the necessary data to compute percentiles, we can
query Prometheus to display the percentiles of our application:
🛠️ Go to Grafana, to explore Metrics again.
🛠️ To compute the percentiles for the easypay_payment_process histogram we have created:
- Select the
easypay_payment_process_milliseconds_bucketmetric, - Click on
Operationsand selectAggregations>Histogram quantile, - Select a Quantile value,
- Click on
Run query.
7. Visualization
🛠️ Go back to Grafana (port 3000), and go into the Dashboards section.
🛠️ We will import the dashboard defined in the docker/grafana/dashboards/easypay-monitoring.json file:
- Click on
New(top right), and selectImport, - In the
Import via dashboard JSON modelfield, paste the content of theeasypay-monitoring.jsonfile and click onLoad, - Select Prometheus as a data source.
You should be redirected to the Easypay Monitoring dashboard.
It provides some dashboards we have created from the new metrics you exposed in your application:
Payment request count total (rated): represents the number of hit per second in our application computed from our counter,Payment Duration distribution: represents the various percentiles of our application computed from theeasypay_payment_processhistogram,Requests process performanceandRequests store performance: are a visualization of the buckets of the two histograms we created previously.
🛠️ You can generate some load to view your dashboards evolving live:
$ k6 run -u 2 -d 2m k6/01-payment-only.jsNote
Do not hesitate to explore the way the panels are created, and the queries we used!
Just hover the panel you are interested in, click on the three dots and select Edit.