sciencegateways - Tech Blog: Collecting Docker Metrics Using Python and Prometheus

Tech Blog: Collecting Docker Metrics Using Python and Prometheus

Details: Published on Wednesday, 02 October 2019 16:00

By Julia Looney

Check out Julia's webinar: Collecting Docker metrics with Python and Prometheus.

What is Prometheus?

Prometheus allows you to create and use time-series metrics for monitoring, alerting, and graphing.

Features of Prometheus

open source
uses time-series data
metrics are specified with a name and key/value pairs
uses a query language, PromQL, to allow you to create more flexible monitoring
data collection for the time series data occurs over HTTP
contains a graphing dashboard and is also compatible with other open source graphing tools such as Grafana

Components

Prometheus Server
Jobs/Applications
Alert manager
Data visualization

Prometheus Server

The Prometheus application itself. It runs on an HTTP Server and scrapes metrics from the specified targets.

In this Example

We will cover setting up the Prometheus Server to scrape an application for metrics, which it will then save to its time-series database. We will also look a bit at the Prometheus Dashboard.

What are the benefits to science gateway developers? What problems does it address?

With Prometheus, you can create custom metrics for your gateway in order to track whatever attributes you’d like. This can be anything from memory/cpu to uptime. Prometheus features several features in which to track and handle metrics.

Jobs/Applications

Anything that generates metrics for Prometheus to scrape. Prometheus uses HTTP to scrape an endpoint for the job or application.

For short-lived jobs, a pushgateway can be used. For time-series metrics, the standard scraping is used.

Alert Manager

Alerts can be set up using PromQL queries. When one of the alerts is 'fired', the alert manager can be set up to send alerts. These alerts can be sent to various applications such as Slack or HipChat.

Data visualization

Prometheus comes equipped with a basic graphing dashboard that works with PromQL queries. Additionally, services such as Grafana can be easily integrated with Prometheus for more advanced data visualizations.

How can it/has it been implemented by SGCI staff?

At TACC, we have used Prometheus for a number of applications. In Abaco, a Functions-as-a-service API/Gateway, we have used Prometheus metrics in order to implement autoscaling. Custom metrics are used as criteria on whether or not the system should scale up or down. It is also used for JupyterHub, which will be shown here. For JupyterHub, Prometheus is used to collect metrics on memory and CPU usage for users, and alerts are set up for if a user uses too many resources.

What would make someone choose this solution over another?

Prometheus is open-source, easily customizable, and quick to set up. It also pairs well with Docker and Kubernetes, making it versatile.

Steps of implementation

Creating Metrics with Python

The python Prometheus package

There is a python package called prometheus_client that can be pip installed.

https://github.com/prometheus/client_python

pip install prometheus_client

Setting up basic metrics

Using prometheus_client, we can create metric objects that Prometheus can scrape. All four metric types can be created.

Counter:

from prometheus_client import Counter

c = Counter('num_requests', 'The number of requests.')

c.inc() # Increments the counter by 1

c.inc(10) # Increments the counter by 10

Gauge:

from prometheus_client import Gauge

g = Gauge('memory_in_gb', 'The amount of memory remaining on this server in GB.')

g.inc() # Increments the gauge by 1

g.dec() # Decrements the gauge by 1

g.set(6.3) # Sets the gauge to an exact value

Histogram:

from prometheus_client import Histogram

h = Histogram('request_latency_seconds', 'Description of histogram')

h.observe(2.5) # Observe the number of seconds

Summary:

from prometheus_client import Summary

s = Summary('request_latency_seconds', 'The request latency in seconds.')

s.observe(3.7) # Observe the number of seconds

Note: the python Prometheus client cannot store quantile information yet.

Adding labels

Labels can also be added to metrics for easier querying. Labels will group together all data points with that given label. To add a label, it will be specified when the metric object is created:

from prometheus_client import Gauge

g = Gauge(

'memory_in_gb',

'The amount of memory remaining on this server in GB.',

['server_name'] # name of the label

)

# When we set a value of a labelled metric,

# we need to specify which label is getting that value

g.labels('server1').set(6.3)

g.labels('server2').set(2.8)

Later when we query the 'memory_in_gb' metric, we will have one gauge listing for each server we specified.

Generating metrics plaintext

In order to scrape and collect metrics, Prometheus needs the metrics to appear in a specific format. Example:

# HELP python_gc_objects_collected_total Objects collected during gc

# TYPE python_gc_objects_collected_total counter

python_gc_objects_collected_total{generation="0"} 1.693277e+06

python_gc_objects_collected_total{generation="1"} 4.99867e+06

python_gc_objects_collected_total{generation="2"} 467275.0

# HELP python_gc_objects_uncollectable_total Uncollectable object found during GC

# TYPE python_gc_objects_uncollectable_total counter

python_gc_objects_uncollectable_total{generation="0"} 0.0

python_gc_objects_uncollectable_total{generation="1"} 0.0

python_gc_objects_uncollectable_total{generation="2"} 0.0

# HELP python_gc_collections_total Number of times this generation was collected

# TYPE python_gc_collections_total counter

python_gc_collections_total{generation="0"} 285137.0

python_gc_collections_total{generation="1"} 25921.0

python_gc_collections_total{generation="2"} 1240.0

# HELP python_info Python platform information

# TYPE python_info gauge

python_info{implementation="CPython",major="3",minor="7",patchlevel="2",version="3.7.2"} 1.0

# HELP process_virtual_memory_bytes Virtual memory size in bytes.

# TYPE process_virtual_memory_bytes gauge

process_virtual_memory_bytes 1.668186112e+09

# HELP process_resident_memory_bytes Resident memory size in bytes.

# TYPE process_resident_memory_bytes gauge

process_resident_memory_bytes 4.4384256e+07

# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.

# TYPE process_start_time_seconds gauge

process_start_time_seconds 1.55535539383e+09

The Prometheus python library includes a function (generate_latest()) that will turn all of the metrics objects into the plaintext format that Prometheus needs to scrape.

For example, if you are returning all your metrics in a function, you could return this:

return Response(generate_latest(), mimetype=CONTENT_TYPE_LATEST)

Configuring Prometheus to scrape metrics

Prometheus Config file

In order for Prometheus to scrape and collect metrics, it needs a configuration file. This file will specify the locations it will scrape for metrics. At the very basic level, Prometheus will scrape itself. This is because the Prometheus service itself generates metrics about itself.

This configuration file is written in YAML:

global:

scrape_interval: 5s

external_labels:

monitor: 'my-monitor'

scrape_configs:

- job_name: 'prometheus'

static_configs:

- targets: ['localhost:9090']

- job_name: 'my-docker-metrics'

scrape_interval: 5s

static_configs:

- targets: ['172.17.0.1:5000']

The first thing we can specify is the interval of time between each scrape. The default is 5 seconds, but this can be any value.

Next, we set up the scrape configs. Each job under the configs represents a target that Prometheus will scrape. In this example, we have two, which are the Prometheus service itself and a webapp that we will create later. All we need to do is give the job a name and then provide an IP address where the plaintext is located.

Deploying Prometheus

To deploy Prometheus, we will be using Docker. Prometheus has its own official Docker image, so all we will need to do is tell it to use our config file.

To make things easier, we will use docker-compose, which uses YAML:

prom:

image: prom/prometheus:v2.1.0

volumes:

- ./prometheus.yml:/etc/prometheus/prometheus.yml

command:

- '--config.file=/etc/prometheus/prometheus.yml'

- '--storage.tsdb.path=/prometheus'

ports:

- '9090:9090'

Full Basic Example

Now we will look at a full implementation of a basic Prometheus setup. In this example, we will be setting up a simple Flask API which will contain a /metrics endpoint for Prometheus to scrape. Then, we will set up some custom metrics that will show up on the Prometheus dashboard.

First, we will create a python file called app.py for our Flask app. In this file, we will also use the python Prometheus library to create a simple counter metric.

######### app.py ########

from flask import Flask, send_file, request, Response

from prometheus_client import start_http_server, Counter, generate_latest, Gauge

import docker

import logging

logger = logging.getLogger(__name__)

app = Flask(__name__)

CONTENT_TYPE_LATEST = str('text/plain; version=0.0.4; charset=utf-8')

users_per_worker = Gauge(

'number_of_users_on_this_worker',

'The number of users with notebook servers on this worker.'

)

my_basic_counter = Counter(

'my_basic_counter',

'A basic counter.'

)

@app.route('/metrics', methods=['GET'])

def get_data():

"""Returns all data as plaintext."""

my_basic_counter.inc()

return Response(generate_latest(), mimetype=CONTENT_TYPE_LATEST)

if __name__ == '__main__':

app.run(debug=True, host='0.0.0.0')

Once we have our basic API set up, we will need a Dockerfile so we can run it in a container later. Create the Dockerfile in the same directory as app.py

###### Dockerfile #######

from python:3.7

RUN apt-get update && apt-get install -y python3-tk

COPY ./requirements.txt /requirements.txt

RUN pip install -r /requirements.txt

COPY . /

ENTRYPOINT ["python"]

CMD ["/app.py"]

For our Dockerfile, we will just be running our app.py script. Since we will need to include the Prometheus python package, as well as a few other python packages, we will include a requirements.txt file and pip install all of the required packages.

requirements.txt

Flask

prometheus_client

Docker

Save your requirements.txt file in the same directory as your other files.

Now you can build your Dockerfile:

docker build -t basic-example .

Finally, we will create a docker-compose.yml file that we will use to run our Dockerfile.

########## docker-compose.yml ##########

version: "2"

services:

api:

image: jlooney/basic-prom-example

ports:

- "5000:5000"

For our docker-compose.yml file, we will only have one service, which is our metrics api.

Next, we will need to set up our Prometheus files. Create a new directory called ‘prometheus’. At this point, your directory structure should look like this:

In the Prometheus directory, create two files: docker-compose.yml and prometheus.yml. These are the files we will use to deploy the Prometheus service.

In prometheus.yml, we will specify how we want Prometheus to collect metrics and for which services.

prometheus.yml

global:

scrape_interval: 5s

external_labels:

monitor: 'my-monitor'

scrape_configs:

- job_name: 'prometheus'

static_configs:

- targets: ['localhost:9090']

- job_name: 'my-metrics'

scrape_interval: 5s

static_configs:

- targets: ['172.17.0.1:5000']

In our config file, we have two services. The first is the default service for Prometheus itself. Out of the box, Prometheus will generate and save metrics about itself for things like uptime. The second service will be for the metrics api we just created. For the most basic implementation, we will just provide it with a target IP where it can expect the plaintext metrics to be located. By default, it will assume the metrics are at [ip]/metrics.

Next, we’ll need to set up a docker-compose.yml file for our Prometheus service. Since we will be using the dockerhub official Prometheus image, we won’t need to create our own.

version: '3'

services:

prom:

image: prom/prometheus:v2.1.0

volumes:

- ./prometheus.yml:/etc/prometheus/prometheus.yml

command:

- '--config.file=/etc/prometheus/prometheus.yml'

ports:

- '9090:9090'

The main thing to note here is that we’ll need to add our prometheus.yml file as a volume. It needs to be mounted in a particular location (as noted in the yml file) where the official Prometheus image expects it.

Now we can run both our app and Prometheus! Go ahead and run docker-compose up -d in both directories where the docker-compose.yml files are. This will set up both the metrics API at localhost:8000 and the Prometheus dashboard at localhost:9090.

Once you navigate to localhost:9090 in your browser, you should see the Prometheus dashboard where you can query your custom metrics!

For a slightly more complex example, check out my github repo.

This example will collect stats from all running docker containers including memory and cpu. This can be very useful for any project that runs using Docker.

Tech Blogs are written by SGCI experts and offer a variety of resources for gateway builders and developers. It's likely that, from reading this post, you'll have what you need to incorporate these technologies on your own, but we're here to help if you ever need it. Submit a Consulting Services Request Form to get help with adding this or any other tool, or email us at help@sciencegateways.org with questions.

Tech Blog: Collecting Docker Metrics Using Python and Prometheus

Creating Metrics with Python

Setting up basic metrics

Adding labels

Generating metrics plaintext

Configuring Prometheus to scrape metrics

View all Tech Blogs here.