Ensuring a Linux machine's network connection stays up with Bash

Recently, I had the unpleasant experience of my Lab machine at University dropping offline. It has a tendency to do this randomly - and normally I'd just reboot it myself, but since I'm working from home at the moment it meant that I couldn't go in to fix it. This unfortunately meant that I was stuck waiting for a generous technician to go in and reboot it for me.

With access now restored I decided that I really didn't want this to happen again, so I've written a simple Bash script to resolve the issue.

It works by checking for an Internet connection every hour by pinging - and if it doesn't manage to do so successfully, then it will reboot. A simple concept, but I discovered a number of things that needed considering while writing it:

  1. To avoid detecting transient network issues, we should make multiple attempts before giving up and rebooting
  2. Those multiple attempts need to be delayed to be effective
  3. We mustn't reboot more than once an hour to avoid getting into a 'reboot loop'
  4. If we're running an experiment, we need a way to temporarily delay it from doing it's checks that will resume automatically
  5. We could try and diagnose the network error or turn the networking of and on again, but if it gets stuck halfway through then we're locked out (very undesirable) - so it's easier / safer to just reboot

With these considerations in mind, I came up with this: (link to part of a GitHub Gist, as it's quite long)

This script requires Bash version 4+ and has a number of environment variables that can configure its behaviour:

Environment Variable Description
CHECK_EXTERNAL_HOST The domain name or IP address to ping to check the connection
CHECK_INTERVAL The interval to check the connection in seconds
CHECK_TIMEOUT Wait at most this long for a reply to our ping
CHECK_RETRIES Retry this many times before giving up and rebooting
CHECK_RETRY_DELAY Delay this many seconds in between retries
CHECK_DRY_RUN If true, then don't actually reboot (useful for testing)
CHECK_REBOOT_DELAY Leave at least this many minutes in between reboots
CHECK_POSTPONE_FILE If this file exists and has a recent last-modified time (mtime), don't actually reboot
CHECK_POSTPONE_MAXAGE The maximum age in minutes of the CHECK_POSTPONE_FILE to consider it fresh and avoid rebooting

With these environment variables, it covers all 4 points in the above list. To expand on CHECK_POSTPONE_FILE, if I'm running an experiment for example and I don't want it to reboot in the middle of said experiment, then I can simply run touch /path/to/postpone_file to delay network connection-related reboots for 7 days (by default). After this time, it will automatically start rebooting again if it drops off the network. This ensures that it will always restart monitoring eventually - as if I had a more manual system I'd forget to re-enable it and then loose access.

Another consideration is that the /var/cache directory must exist. This is because an empty tracking file is created there to keep track of when the last network connection-related reboot occurred.

With the script written, then next step is to have it run automatically on boot. For systemd-based systems such as my lab machine, a systemd service is the order of the day. This is relatively simple:

Description=Reboot if the network connection is down

# Because it needs to be able to reboot
ExecStartPre=/bin/sleep 60
ExecStart=/bin/bash "/usr/local/lib/ensure-network/"


(View the latest version in the GitHub Gist)

This assumes that the script is located at /usr/local/lib/ensure-network/ It also allows for an environment file to optionally be created at /etc/default/ensure-network, so that you can customise the parameters. Here's an example environment file:

The above example environment file checks against every minute instead of the default every hour. You can, of course, specify any (or all) of the environment variables detailed above in the environment file if you wish.

That completes my setup - so hopefully I don't encounter any more network-related issues that lock me out of accessing my lab machine remotely! To install it yourself, you can do this:

# Create the directory for the script to live in
sudo mkdir /usr/local/lib/ensure-network
# Download the script & service file
sudo curl -L -O /usr/local/lib/ensure-network/
sudo curl -L -O /etc/systemd/system/ensure-network.service
# Start the service & enable it on boot
sudo systemctl daemon-reload
sudo systemctl start ensure-network.service
sudo systemctl enable ensure-network.service

You might need to replace the URLs there with the latest ones that download the raw content from the GitHub Gist.

Did you find this useful? Got a suggestion to make it better? Running into issues? Comment below!

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression containerisation css dailyprogrammer data analysis debugging demystification distributed computing documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro network networking nibriboard node.js operating systems own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference releases resource review rust searching secrets security series list server software sorting source code control statistics storage svg talks technical terminal textures thoughts three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt


Art by Mythdael