Alerting with Grafana – Implementing Traffic Management, Security, and Observability with Istio

To initiate the alerting process, it’s crucial to establish clear criteria. Given the limited volume at hand, simulating an accurate SLO breach can be challenging. For simplicity, our alerting criteria will trigger when traffic volume surpasses one transaction per second.

The initial phase of this process involves crafting the query to retrieve the necessary metrics. We will employ the following query to achieve this objective:

round(sum(irate(istio_requests_total{connection_security_policy=”mutual_tls”,destination_ service=~”frontend.blog-app.svc.cluster.local”,reporter=~”destination”,source_ workload=~”istio-ingress”,source_workload_namespace=~”istio-ingress”}[5m])) by (source_ workload, source_workload_namespace, response_code), 0.001)

The provided query determines the traffic rate for all transactions passingthrough the Istio ingress gateway to the frontend microservice.

The next step involves creating the alert rules with the query in place. To do this, navigate to Home >

Alerting > Alert rules. Then, fill in the form, as illustrated in the following screenshot:

Figure 15.13 – Defining alert rules

The alert rule is configured to monitor for violations at a 1-minute interval for 2 consecutive minutes. Once the alert rule has been established, triggering the alert is as simple as refreshing the Blog App home page about 15–20 times rapidly every 1 to 2 minutes. This action should activate the alert. To observe this process, navigate to Home > Alerting > Alert rules. You will notice the alert in a Pending state in the first minute. This means it has detected a violation in one of its checks and will wait for another violation within the 2-minute duration before triggering the alert.

In a production environment, setting longer check intervals, typically around 5 minutes, with alerting intervals of 15 minutes is typical. This approach helps avoid excessivealerting for self-resolving transient issues, ensuring the SRE team is not inundated with false alerts. The goal is to maintain a balance and prevent the team from treating every alert as a potential false alarm. The following screenshot shows a pending alert:

Figure 15.14 – Alert pending

After the 2-minute monitoring period, you should observe the alert being triggered, as depicted in the following screenshot. This indicates that the alert rule has successfully identified a sustained violation of the defined criteria and is now actively notifying relevant parties or systems:

Figure 15.15 – Alert firing

Since no specific alert channels have been configured in this context, the fired alerts will be visible within the Grafana dashboard only. It is highly advisable to set up a designated alert destination for sending alerts to your designated channels, using a tool such as PagerDuty to page on-call engineers or Slack notifications to alert your on-call team. Proper alert channels ensure that the right individuals or teams are promptly notified of critical issues, enabling rapid response and issue resolution.

Summary

As we conclude this chapter and wrap up this book, our journey has taken us through an array of diverse concepts and functionalities. While we’ve covered substantial ground in this chapter, it’s essential to recognize that Istio is a rich and multifaceted technology, making it a challenge to encompass all its intricacies within a single chapter.

This chapter marked our initiation into the world of service mesh, shedding light on its particular advantages in the context of microservices. Our exploration extended to various dimensions of Istio, beginning with installing Istio and extending our sample Blog App to utilize it using automatic sidecar injection. We then moved on to security, delving into the intricacies of securing ingress gateways with mTLS, enforcing strict mTLS among microservices, and harnessing authorization policies to manage traffic flows.

Our journey then led us to traffic management, where we introduced essential concepts such as destination rules and virtual services. These enabled us to carry out canary rollouts and traffic mirroring, demonstrating the power of controlled deployments and real-time traffic analysis. Our voyage culminated in observability, where we harnessed the Kiali dashboard to visualize service interactions and ventured deep into advanced monitoring and alerting capabilities using Grafana.

As we end this remarkable journey, I want to extend my heartfelt gratitude to you for choosing this book and accompanying me through its pages. I trust you’ve found every part of this book enjoyable and enlightening. I hope this book has equipped you with the skills necessary to excel in the ever-evolving realm of modern DevOps. I wish you the utmost success in all your present and future endeavors.

Leave a Reply

Your email address will not be published. Required fields are marked *