A simple guide to Grafana/Prometheus
Introduction
In this post I am going to give a simple guide on how to set up Prometheus monitoring (with AlertManager event notifications), and how you can use Grafana to view the metrics.
If you are anything like me, you may be hesitant to deploy a Grafana/Prometheus based solution. For many years I have been relying on hosted solutions – in my case I have used NodeQuery, nixstats.com, hetrixtools (hosted solutions), but have also used self-hosted solutions, including munin & zabbix.
Initially a Grafana based solution can seem overwhelming. You might find yourself asking many questions…. for example: What is Grafana? Why do I need so many different tools? What is prometheus? What is alertmanager? What is a TSDB? What is PromQL? I don’t have the time to learn all of these things! Such a solution sounds overkill, I only need basic metrics!
The good news is it’s actually fairly simple, and you don’t need to understand the intricacies of each of these tools before you can take advantage of a grafana/prometheus/alertmanager stack. In this post I will document deploying this stack, for a complete begineer, within 15 minutes…. by the end of this tutorial we can have something like the screenshot below.
First steps…
There are four main things you will need to know and configure:
- Grafana
Grafana is an open source analytics system that can provide the ability to almost instantly generate things like graphs, meters, and other visualization tools - Prometheus
Prometheus is the software that will pull your metrics from each server and organize them into it’s optimized time-series-database system. Grafana connects with Prometheus to display your data. - node_exporter
An agent that will run on each machine you want to monitor. Prometheus simply does a HTTP GET on each machines node_exporter. - AlertManager
_(AlertManager will not be covered in this article)
_Provides the ability to forward prometheus alerts into services like Slack.
We will be using dockprom, a docker-compose solution available on github:
git clone https://github.com/stefanprodan/dockprom cd dockprom ADMIN_USER=admin ADMIN_PASSWORD=admin docker-compose up
The entire stack should now be running… pretty easy so far, huh?
Configuring Grafana
Let’s begin by logging in by visiting http://your-domain:3000
You should be prompted to change your password to something more secure… if not, the option can be found under Server Admin->Users.
The dockprom stack also provides some decent dashboards which can be found under Manage Dashboards.
You should now be able to monitor server metrics with these dashboards… but what if we want to monitor more than one server?
There are two methods: We can either duplicate these dashboards (one dashboard for each server), or we can modify them to support variables (one dashboard for multiple servers).
I have gone ahead and created a custom dashboard that supports variables and contains some other tweaks/improvements that you can download here. This can be imported into Grafana (via ‘Import’ on the Manage Dashboards page)
If you wish to do it manually: you can duplicate one of the existing dashboards, and then append {instance="$instance"}
to each of your queries. (Example). Then go back and edit your dashboard, and add an “instance” variable under edit->variables.
Setting up more node_exporters
node_exporter is the agent used on each machine / container to export metrics. These metrics are passed to your Prometheus server, and forms the basis of the data used by your Grafana dashboard.
Ubuntu includes packages for node_exporter, however they are out of date. It’s better to either manually install it yourself, or optionally use something like ansible to deploy it to each of your machines / containers.
If you opt to go the ansible route, I have produced an ansible-playbook to install node_exporter and correctly configure systemd that you can use https://github.com/rickyhewitt/prometheus_node_exporter
Updating prometheus.yml
Once you have setup node_exporter on the servers / containers you wish to monitor, you will need to add them as a target in your prometheus/prometheus.yml config.
You can then restart prometheus with docker-compose restart prometheus
and they should appear within the Prometheus Web panel at http://your-domain:9090/targets/
Within Grafana you should now be able to access these targets within your dashboard, or optionally you might want to experiment with trying to query them from within the “Explore” section of Grafana.
An example would perhaps be node_load15
(returns 15m load avg for ALL targets), or node_load15 {instance=“example-instance.com:9100”} (returns 15m load avg for example-instance.com:9100).
Through queries like this you can expand the example dashboards, and there are also plenty of pre-made dashboards available online for inspiration.
Conclusion
Hopefully this basic introduction can help you deploy a working Grafana/Prometheus stack, with the ability to monitor multiple instances.
Next you could configure AlertManager to forward your alerts to a service such as Slack. I have opted not to cover this in this article – however you can modify the alertmanager.yml file provided by dockprom, or alternatively read more about configuring alertmanager here.
Feel free to let me know how it works out for you :)