Tech Blog: Collecting Docker Metrics Using Python and Prometheus
- Details
- Published on Wednesday, 02 October 2019 16:00
By Julia Looney
Check out Julia's webinar: Collecting Docker metrics with Python and Prometheus.
What is Prometheus?
Prometheus allows you to create and use time-series metrics for monitoring, alerting, and graphing.
Features of Prometheus
- open source
- uses time-series data
- metrics are specified with a name and key/value pairs
- uses a query language, PromQL, to allow you to create more flexible monitoring
- data collection for the time series data occurs over HTTP
- contains a graphing dashboard and is also compatible with other open source graphing tools such as Grafana
Components
- Prometheus Server
- Jobs/Applications
- Alert manager
- Data visualization
Prometheus Server
The Prometheus application itself. It runs on an HTTP Server and scrapes metrics from the specified targets.
In this Example
We will cover setting up the Prometheus Server to scrape an application for metrics, which it will then save to its time-series database. We will also look a bit at the Prometheus Dashboard.
What are the benefits to science gateway developers? What problems does it address?
With Prometheus, you can create custom metrics for your gateway in order to track whatever attributes you’d like. This can be anything from memory/cpu to uptime. Prometheus features several features in which to track and handle metrics.
Jobs/Applications
Anything that generates metrics for Prometheus to scrape. Prometheus uses HTTP to scrape an endpoint for the job or application.
For short-lived jobs, a pushgateway can be used. For time-series metrics, the standard scraping is used.
Alert Manager
Alerts can be set up using PromQL queries. When one of the alerts is 'fired', the alert manager can be set up to send alerts. These alerts can be sent to various applications such as Slack or HipChat.
Data visualization
Prometheus comes equipped with a basic graphing dashboard that works with PromQL queries. Additionally, services such as Grafana can be easily integrated with Prometheus for more advanced data visualizations.
At TACC, we have used Prometheus for a number of applications. In Abaco, a Functions-as-a-service API/Gateway, we have used Prometheus metrics in order to implement autoscaling. Custom metrics are used as criteria on whether or not the system should scale up or down. It is also used for JupyterHub, which will be shown here. For JupyterHub, Prometheus is used to collect metrics on memory and CPU usage for users, and alerts are set up for if a user uses too many resources.
What would make someone choose this solution over another?
Prometheus is open-source, easily customizable, and quick to set up. It also pairs well with Docker and Kubernetes, making it versatile.
Steps of implementation
Creating Metrics with Python
The python Prometheus package
There is a python package called prometheus_client that can be pip installed.
https://github.com/prometheus/client_python
pip install prometheus_client
Setting up basic metrics
Using prometheus_client, we can create metric objects that Prometheus can scrape. All four metric types can be created.
Counter:
from prometheus_client import Counter
c = Counter('num_requests', 'The number of requests.')
c.inc() # Increments the counter by 1
c.inc(10) # Increments the counter by 10
Gauge:
from prometheus_client import Gauge
g = Gauge('memory_in_gb', 'The amount of memory remaining on this server in GB.')
g.inc() # Increments the gauge by 1
g.dec() # Decrements the gauge by 1
g.set(6.3) # Sets the gauge to an exact value
Histogram:
from prometheus_client import Histogram
h = Histogram('request_latency_seconds', 'Description of histogram')
h.observe(2.5) # Observe the number of seconds
Summary:
from prometheus_client import Summary
s = Summary('request_latency_seconds', 'The request latency in seconds.')
s.observe(3.7) # Observe the number of seconds
Note: the python Prometheus client cannot store quantile information yet.
Adding labels
Labels can also be added to metrics for easier querying. Labels will group together all data points with that given label. To add a label, it will be specified when the metric object is created:
from prometheus_client import Gauge
g = Gauge(
'memory_in_gb',
'The amount of memory remaining on this server in GB.',
['server_name'] # name of the label
)
# When we set a value of a labelled metric,
# we need to specify which label is getting that value
g.labels('server1').set(6.3)
g.labels('server2').set(2.8)
Later when we query the 'memory_in_gb' metric, we will have one gauge listing for each server we specified.
Generating metrics plaintext
In order to scrape and collect metrics, Prometheus needs the metrics to appear in a specific format. Example:
# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 1.693277e+06
python_gc_objects_collected_total{generation="1"} 4.99867e+06
python_gc_objects_collected_total{generation="2"} 467275.0
# HELP python_gc_objects_uncollectable_total Uncollectable object found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
# HELP python_gc_collections_total Number of times this generation was collected
# TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 285137.0
python_gc_collections_total{generation="1"} 25921.0
python_gc_collections_total{generation="2"} 1240.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="7",patchlevel="2",version="3.7.2"} 1.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 1.668186112e+09
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 4.4384256e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.55535539383e+09
The Prometheus python library includes a function (generate_latest()) that will turn all of the metrics objects into the plaintext format that Prometheus needs to scrape.
For example, if you are returning all your metrics in a function, you could return this:
return Response(generate_latest(), mimetype=CONTENT_TYPE_LATEST)
Configuring Prometheus to scrape metrics
Prometheus Config file
In order for Prometheus to scrape and collect metrics, it needs a configuration file. This file will specify the locations it will scrape for metrics. At the very basic level, Prometheus will scrape itself. This is because the Prometheus service itself generates metrics about itself.
This configuration file is written in YAML:
global:
scrape_interval: 5s
external_labels:
monitor: 'my-monitor'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'my-docker-metrics'
scrape_interval: 5s
static_configs:
- targets: ['172.17.0.1:5000']
The first thing we can specify is the interval of time between each scrape. The default is 5 seconds, but this can be any value.
Next, we set up the scrape configs. Each job under the configs represents a target that Prometheus will scrape. In this example, we have two, which are the Prometheus service itself and a webapp that we will create later. All we need to do is give the job a name and then provide an IP address where the plaintext is located.
Deploying Prometheus
To deploy Prometheus, we will be using Docker. Prometheus has its own official Docker image, so all we will need to do is tell it to use our config file.
To make things easier, we will use docker-compose, which uses YAML:
prom:
image: prom/prometheus:v2.1.0
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
ports:
- '9090:9090'
Full Basic Example
Now we will look at a full implementation of a basic Prometheus setup. In this example, we will be setting up a simple Flask API which will contain a /metrics endpoint for Prometheus to scrape. Then, we will set up some custom metrics that will show up on the Prometheus dashboard.
First, we will create a python file called app.py for our Flask app. In this file, we will also use the python Prometheus library to create a simple counter metric.
######### app.py ########
from flask import Flask, send_file, request, Response
from prometheus_client import start_http_server, Counter, generate_latest, Gauge
import docker
import logging
logger = logging.getLogger(__name__)
app = Flask(__name__)
CONTENT_TYPE_LATEST = str('text/plain; version=0.0.4; charset=utf-8')
users_per_worker = Gauge(
'number_of_users_on_this_worker',
'The number of users with notebook servers on this worker.'
)
my_basic_counter = Counter(
'my_basic_counter',
'A basic counter.'
)
@app.route('/metrics', methods=['GET'])
def get_data():
"""Returns all data as plaintext."""
my_basic_counter.inc()
return Response(generate_latest(), mimetype=CONTENT_TYPE_LATEST)
if __name__ == '__main__':
app.run(debug=True, host='0.0.0.0')
Once we have our basic API set up, we will need a Dockerfile so we can run it in a container later. Create the Dockerfile in the same directory as app.py
###### Dockerfile #######
from python:3.7
RUN apt-get update && apt-get install -y python3-tk
COPY ./requirements.txt /requirements.txt
RUN pip install -r /requirements.txt
COPY . /
ENTRYPOINT ["python"]
CMD ["/app.py"]
For our Dockerfile, we will just be running our app.py script. Since we will need to include the Prometheus python package, as well as a few other python packages, we will include a requirements.txt file and pip install all of the required packages.
requirements.txt
Flask
prometheus_client
Docker
Save your requirements.txt file in the same directory as your other files.
Now you can build your Dockerfile:
docker build -t basic-example .
Finally, we will create a docker-compose.yml file that we will use to run our Dockerfile.
########## docker-compose.yml ##########
version: "2"
services:
api:
image: jlooney/basic-prom-example
ports:
- "5000:5000"
For our docker-compose.yml file, we will only have one service, which is our metrics api.
Next, we will need to set up our Prometheus files. Create a new directory called ‘prometheus’. At this point, your directory structure should look like this:
In the Prometheus directory, create two files: docker-compose.yml and prometheus.yml. These are the files we will use to deploy the Prometheus service.
In prometheus.yml, we will specify how we want Prometheus to collect metrics and for which services.
prometheus.yml
global:
scrape_interval: 5s
external_labels:
monitor: 'my-monitor'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'my-metrics'
scrape_interval: 5s
static_configs:
- targets: ['172.17.0.1:5000']
In our config file, we have two services. The first is the default service for Prometheus itself. Out of the box, Prometheus will generate and save metrics about itself for things like uptime. The second service will be for the metrics api we just created. For the most basic implementation, we will just provide it with a target IP where it can expect the plaintext metrics to be located. By default, it will assume the metrics are at [ip]/metrics.
Next, we’ll need to set up a docker-compose.yml file for our Prometheus service. Since we will be using the dockerhub official Prometheus image, we won’t need to create our own.
version: '3'
services:
prom:
image: prom/prometheus:v2.1.0
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
command:
- '--config.file=/etc/prometheus/prometheus.yml'
ports:
- '9090:9090'
The main thing to note here is that we’ll need to add our prometheus.yml file as a volume. It needs to be mounted in a particular location (as noted in the yml file) where the official Prometheus image expects it.
Now we can run both our app and Prometheus! Go ahead and run docker-compose up -d in both directories where the docker-compose.yml files are. This will set up both the metrics API at localhost:8000 and the Prometheus dashboard at localhost:9090.
Once you navigate to localhost:9090 in your browser, you should see the Prometheus dashboard where you can query your custom metrics!
For a slightly more complex example, check out my github repo.
This example will collect stats from all running docker containers including memory and cpu. This can be very useful for any project that runs using Docker.