Introduction

Prometheus might have its origin in Greek mythology, it is certainly not a strange creature to any sysadmin that needs a quick overview of the metrics of their server.

I'd like to take a moment to appreciate its greatness and give you an overview of how Prometheus is supposed to work and what it is supposed to do.

As with any tool, no matter how mythical it is, the power of the tool lies with the person who yields it.

Prometheus is an excellent tool because it helps us scrape all the metrics from our servers that we need. We can simply set up a template and tell Prometheus where our servers are, after which Prometheus will pull the metrics from those servers and get them in one place.

Now as we all know the gods didn't come alone and neither did Prometheus. On its own, it's already a very strong tool but combined with other tools such as a grafana we can build a team when it comes to getting an overview of our servers.

Dive into the fire with me, my friends and explore as a mythical tool, let's see how it can help you and your organization grow at a sustainable rate and give you an overview of the services you have running. We want to do this independently of the tech that needs to be monitored and we certainly want all our metrics in one place.

What is Prometheus?

Prometheus is a 100% open-source and community-driven tool that helps organizations get a better insight into the metrics of their servers and services. All components are available under the Apache 2 License on GitHub. These days, software architecture is often distributed and no longer situated under 1 server. Companies often employ multiple services and servers per environment and need a way to monitor the metrics these servers and services produce.

Prometheus is so much more than just a collecting agent though, it’s a system that can do so much more. For starters, one of the main advantages of Prometheus is a tool that can be used to set rules on these metrics and even trigger alerts if need be. It has the power to inform the user when things their servers are displaying irregular behavior and can be easily deployed as it is not reliant on a distributed storage system.

We can set up a lot of rules, but nothing works as well as the human eye in spotting irregularities that are harder to describe in rules. This is why Prometheus also offers a visualization module that can help us chart the metrics we are collecting. This gives us a great live insight into the health of our servers and services.

How does it work?

You might be wondering how this solution is implemented in a software architecture design, and to show this, we first have to make sure that we know exactly how Prometheus gathers its’ metrics. This is all done by making active HTTP requests to endpoints that return data that can be interpreted by Prometheus. This is often done by using ‘exporters’ and ‘integrations’. Every service is going to have its’ own ‘exporter’. (https://prometheus.io/docs/instrumenting/exporters/) For example, every popular HTTP service has an exporter, but we also find exporters for DB systems, storage systems, APIs, Loggers, and much more! The real beauty comes from the fact users are encouraged to write their own exporters if none exist yet for their code: https://prometheus.io/docs/instrumenting/writing_exporters/

These exporters will expose endpoints that can be called by Prometheus to gather the required data. With this data, Prometheus can then create visualizations that we can easily configure to our liking. Some often-gotten metrics for example are the number of errors over time, CPU usage over time, or active threads on a database. We can often see however that the graphing capabilities are not always sufficient, in that case, we can fall back on tools such as grafana which has supported Prometheus for a while.

Another powerful tool at our disposal are the alerts. This section consists of two parts, one being the server sending out the alerts and the other being the server catching and handling those alerts (For example, send an email and SMS in case of >20% error rate). Together these two components allow us to create powerful rulesets that can help monitor infrastructure at all hours of the day.

What can I be used for?

With those powerful capabilities, Prometheus has not stolen its name. The god of fire reigns supreme over the server kingdom. We can use this tool to not only scrape metrics from servers and congregate them in one place. We also get the tools to visualize our infrastructure’s metrics and set proper alerting rules. Together, it is not hard to imagine a world where all our servers and services are monitored, made visible, and alerted in a timely fashion.

We can’t forget the auxiliary uses though as the blacksmith wields the fire, we should also learn to yield many tools as they can combine into a sum greater than its parts. And as the blacksmith combines tools, so must we if we want to effectively make use of centralized time series’ fullest potentials and consider adding a tool like Grafana for example for better visualization.

Conclusion

Powerful as this tool is, it still relies on the services properly reporting the correct metrics, monitoring and alerting the correct incidents, and setting up the right visualizations. With any of these components missing, we could still get a false sense of security. That being said, when we apply the correct tactics in monitoring and alerting with the powerful capabilities of centralized time series and advanced integration capabilities, it will be a lot easier to diagnose issues and even prevent them before they occur.