Starbeamrainbowlabs

Stardust
Blog

Monitoring latency / ping with Collectd and Bash

I use Collectd as the monitoring system for the devices I manage. As part of this, I use the Ping plugin to monitor latency to a number of different hosts, such as GitHub, the raspberry pi apt repo, 1.0.0.1, and this website.

I've noticed for a while that the ping plugin doesn't always work: Even when I check to ensure that a host is pingable before I add it to the monitoring list, Collectd doesn't always manage to ping it - showing it as NaN instead.

Yesterday I finally decided that enough was enough, and that I was going to do something about it. I've blogged about using the exec plugin with a bash script before in a previous post, in which I monitor HTTP response times with curl.

Using that script as a base, I adapted it to instead parse the output of the ping command and translate it into something that Collectd understands. If you haven't already, you'll want to go and read that post before continuing with this one.

The first order of business is to sort out the identifier we're going to use for the data in question. Collectd assigns an identifier to all the the data coming in, and it uses this to determine where it is stored on disk - and subsequently which graph it will appear in on screen in the front-end.

Such identifiers follow this pattern:

host/plugin-instance/type-instance

This can be broken down into the following parts:

Part Meaning
host The hostname of the machine from which the data was collected
plugin The name of the plugin that collected the data (e.g. memory, disk, thermal, etc)
instance The instance name of the plugin, if the plugin is enabled multiple times
type The type of reading that was collected
instance If multiple readings for a given type are collected, this instance differentiates between them.

Of note specifically here are the type, which must be one of a number of pre-defined values, which can be found in a text file located at /usr/share/collectd/types.db. In my case, my types.db file contains the following definitions for ping:

To this end, I've decided on the following identifier strings:

HOSTNAME_HERE/ping-exec/ping-TARGET_NAME
HOSTNAME_HERE/ping-exec/ping_droprate-TARGET_NAME
HOSTNAME_HERE/ping-exec/ping_stddev-TARGET_NAME

I'm using exec for the first instance here to cause it to store my ping results separately from the internal ping plugin. The 2nd instance is the name of the target that is being pinged, resulting in multiple lines on the same graph.

To parse the output of the ping command, I've found it easiest if I push the output of the ping command to disk first, and then read it back afterwards. To do that, a temporary directory is needed:

temp_dir="$(mktemp --tmpdir="/dev/shm" -d "collectd-exec-ping-XXXXXXX")";

on_exit() {
    rm -rf "${temp_dir}";
}
trap on_exit EXIT;

This creates a new temporary directory in /dev/shm (shared memory in RAM), and automatically deletes it when the script terminates by scheduling an exit trap.

Then, we can create a temporary file inside the new temporary directory and call the ping command:

tmpfile="$(mktemp --tmpdir="${temp_dir}" "ping-target-XXXXXXX")";
ping -O -c "${ping_count}" "${target}" >"${tmpfile}";

A number of variables in that second command. Let me break that down:

For reference, the output of the ping command looks something like this:

PING starbeamrainbowlabs.com (5.196.73.75) 56(84) bytes of data.
64 bytes from starbeamrainbowlabs.com (5.196.73.75): icmp_seq=1 ttl=55 time=28.6 ms
64 bytes from starbeamrainbowlabs.com (5.196.73.75): icmp_seq=2 ttl=55 time=15.1 ms
64 bytes from starbeamrainbowlabs.com (5.196.73.75): icmp_seq=3 ttl=55 time=18.9 ms

--- starbeamrainbowlabs.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 15.145/20.886/28.574/5.652 ms

We're only interested in the last 2 lines of output. Since this is a long-running script that is going to be executing every 5 minutes, to minimise load (it will be running on a rather overburdened Raspberry Pi 3B+ :P), we will be minimising the number of subprocesses we spawn. To read in the last 2 lines of the file into an array, we can do this:

mapfile -s "$((ping_count+3))" -t file_data <"${tmpfile}"

The -s here tells mapfile (a bash built-in) to skip a given number of lines before reading from the file. Since we know the number of ping requests we sent and that there are 3 additional lines that don't contain the ping request output before the last 2 lines that we're interested in, we can calculate the number of lines we need to skip here.

Next, we can now parse the last 2 lines of the file. The read command (which is also a bash built-in, so it doesn't spawn a subprocess) is great for this purpose. Let's take it 1 line at a time:

read -r _ _ _ _ _ loss _ < <(echo "${file_data[0]}")
loss="${loss/\%}";

Here the read command splits the input on whitespace into multiple different variables. We are only interested in the packet loss here. While the other values might be interesting, Collectd (at least by default) doesn't have a definition in types.db for them and I don't see any huge benefits from adding them anyway, so I use an underscore _ to indicate, by common convention, that I'm not interested in those fields.

We then strip the percent sign % from the end of the packet loss value here too.

Next, let's extract the statistics from the very last line:

read -r _ _ _ _ _ _ min avg max stdev _ < <(echo "${file_data[1]//\// }");

Here we replace all forward slashes in the input with a space to allow read to split it properly. Then, we extract the 4 interesting values (although we can't actually log min and max).

With the values extracted, we can output the statistics we've collected in a format that Collectd understands:

echo "PUTVAL \"${COLLECTD_HOSTNAME}/ping-exec/ping_droprate-${target}\" interval=${COLLECTD_INTERVAL} N:${loss}";
echo "PUTVAL \"${COLLECTD_HOSTNAME}/ping-exec/ping-${target}\" interval=${COLLECTD_INTERVAL} N:${avg}";
echo "PUTVAL \"${COLLECTD_HOSTNAME}/ping-exec/ping_stddev-${target}\" interval=${COLLECTD_INTERVAL} N:${stdev}";

Finally, we mustn't forget to delete the temporary file:

rm "${tmpfile}";

Those are the major changes I made from the earlier HTTP response time monitor. The full script can be found at the bottom of this post. The settings that control the operation of the script are the top, which allow you to change the list of hosts to ping, and the number of ping requests to make.

Save it to something like /etc/collectd/collectd-exec-ping.sh (don't forget to sudo chmod +x /etc/collectd/collectd-exec-ping.sh it), and then append this to your /etc/collectd/collectd.conf:

<Plugin exec>
        Exec    "nobody:nogroup"        "/etc/collectd/collectd-exec-ping.sh"
</Plugin>

Final script

#!/usr/bin/env bash
set -o pipefail;

# Variables:
#   COLLECTD_INTERVAL   Interval at which to collect data
#   COLLECTD_HOSTNAME   The hostname of the local machine

declare targets=(
    "starbeamrainbowlabs.com"
    "github.com"
    "reddit.com"
    "raspbian.raspberrypi.org"
    "1.0.0.1"
)
ping_count="3";

###############################################################################

# Pure-bash alternative to sleep.
# Source: https://blog.dhampir.no/content/sleeping-without-a-subprocess-in-bash-and-how-to-sleep-forever
snore() {
    local IFS;
    [[ -n "${_snore_fd:-}" ]] || exec {_snore_fd}<> <(:);
    read ${1:+-t "$1"} -u $_snore_fd || :;
}

# Source: https://github.com/dylanaraps/pure-bash-bible#split-a-string-on-a-delimiter
split() {
    # Usage: split "string" "delimiter"
    IFS=$'\n' read -d "" -ra arr <<< "${1//$2/$'\n'}"
    printf '%s\n' "${arr[@]}"
}

# Source: https://github.com/dylanaraps/pure-bash-bible#use-regex-on-a-string
regex() {
    # Usage: regex "string" "regex"
    [[ $1 =~ $2 ]] && printf '%s\n' "${BASH_REMATCH[1]}"
}

# Source: https://github.com/dylanaraps/pure-bash-bible#get-the-number-of-lines-in-a-file
# Altered to operate on the standard input.
count_lines() {
    # Usage: count_lines <"file"
    mapfile -tn 0 lines
    printf '%s\n' "${#lines[@]}"
}

# Source https://github.com/dylanaraps/pure-bash-bible#get-the-last-n-lines-of-a-file
tail() {
    # Usage: tail "n" "file"
    mapfile -tn 0 line < "$2"
    printf '%s\n' "${line[@]: -$1}"
}

###############################################################################

temp_dir="$(mktemp --tmpdir="/dev/shm" -d "collectd-exec-ping-XXXXXXX")";

on_exit() {
    rm -rf "${temp_dir}";
}
trap on_exit EXIT;

# $1 - target name
# $2 - url
check_target() {
    local target="${1}";

    tmpfile="$(mktemp --tmpdir="${temp_dir}" "ping-target-XXXXXXX")";

    ping -O -c "${ping_count}" "${target}" >"${tmpfile}";

    # readarray -t result < <(curl -sS --user-agent "${user_agent}" -o /dev/null --max-time 5 -w "%{http_code}\n%{time_total}\n" "${url}"; echo "${PIPESTATUS[*]}");
    mapfile -s "$((ping_count+3))" -t file_data <"${tmpfile}"

    read -r _ _ _ _ _ loss _ < <(echo "${file_data[0]}")
    loss="${loss/\%}";
    read -r _ _ _ _ _ _ min avg max stdev _ < <(echo "${file_data[1]//\// }");


    echo "PUTVAL \"${COLLECTD_HOSTNAME}/ping-exec/ping_droprate-${target}\" interval=${COLLECTD_INTERVAL} N:${loss}";
    echo "PUTVAL \"${COLLECTD_HOSTNAME}/ping-exec/ping-${target}\" interval=${COLLECTD_INTERVAL} N:${avg}";
    echo "PUTVAL \"${COLLECTD_HOSTNAME}/ping-exec/ping_stddev-${target}\" interval=${COLLECTD_INTERVAL} N:${stdev}";

    rm "${tmpfile}";
}

while :; do
    for target in "${targets[@]}"; do
        # NOTE: We don't use concurrency here because that spawns additional subprocesses, which we want to try & avoid. Even though it looks slower, it's actually more efficient (and we don't potentially skew the results by measuring multiple things at once)
        check_target "${target}"
    done

    snore "${COLLECTD_INTERVAL}";
done

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blender blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression containerisation css dailyprogrammer data analysis debugging demystification distributed computing dns docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions freeside future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro minetest network networking nibriboard node.js open source operating systems optimisation own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference releases rendering resource review rust searching secrets security series list server software sorting source code control statistics storage svg systemquery talks technical terminal textures thoughts three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 worldeditadditions xmpp xslt

Archive

Art by Mythdael