Ensuring a Linux machine's network connection stays up with Bash
Recently, I had the unpleasant experience of my Lab machine at University dropping offline. It has a tendency to do this randomly - and normally I'd just reboot it myself, but since I'm working from home at the moment it meant that I couldn't go in to fix it. This unfortunately meant that I was stuck waiting for a generous technician to go in and reboot it for me.
With access now restored I decided that I really didn't want this to happen again, so I've written a simple Bash script to resolve the issue.
It works by checking for an Internet connection every hour by pinging
starbeamrainbowlabs.com - and if it doesn't manage to do so successfully, then it will reboot. A simple concept, but I discovered a number of things that needed considering while writing it:
- To avoid detecting transient network issues, we should make multiple attempts before giving up and rebooting
- Those multiple attempts need to be delayed to be effective
- We mustn't reboot more than once an hour to avoid getting into a 'reboot loop'
- If we're running an experiment, we need a way to temporarily delay it from doing it's checks that will resume automatically
- We could try and diagnose the network error or turn the networking of and on again, but if it gets stuck halfway through then we're locked out (very undesirable) - so it's easier / safer to just reboot
With these considerations in mind, I came up with this:
ensure-network.sh (link to part of a GitHub Gist, as it's quite long)
This script requires Bash version 4+ and has a number of environment variables that can configure its behaviour:
||The domain name or IP address to ping to check the connection|
||The interval to check the connection in seconds|
||Wait at most this long for a reply to our ping|
||Retry this many times before giving up and rebooting|
||Delay this many seconds in between retries|
||Leave at least this many minutes in between reboots|
||If this file exists and has a recent last-modified time (mtime), don't actually reboot|
||The maximum age in minutes of the
With these environment variables, it covers all 4 points in the above list. To expand on
CHECK_POSTPONE_FILE, if I'm running an experiment for example and I don't want it to reboot in the middle of said experiment, then I can simply run
touch /path/to/postpone_file to delay network connection-related reboots for 7 days (by default). After this time, it will automatically start rebooting again if it drops off the network. This ensures that it will always restart monitoring eventually - as if I had a more manual system I'd forget to re-enable it and then loose access.
Another consideration is that the
/var/cache directory must exist. This is because an empty tracking file is created there to keep track of when the last network connection-related reboot occurred.
With the script written, then next step is to have it run automatically on boot. For systemd-based systems such as my lab machine, a systemd service is the order of the day. This is relatively simple:
[Unit] Description=Reboot if the network connection is down After=network.target [Service] Type=simple # Because it needs to be able to reboot User=root Group=root EnvironmentFile=-/etc/default/ensure-network ExecStartPre=/bin/sleep 60 ExecStart=/bin/bash "/usr/local/lib/ensure-network/ensure-network.sh" SyslogIdentifier=ensure-access StandardError=syslog StandardOutput=syslog [Install] WantedBy=multi-user.target
(View the latest version in the GitHub Gist)
This assumes that the
ensure-network.sh script is located at
/usr/local/lib/ensure-network/ensure-network.sh. It also allows for an environment file to optionally be created at
/etc/default/ensure-network, so that you can customise the parameters. Here's an example environment file:
The above example environment file checks against
example.com every minute instead of the default
starbeamrainbowlabs.com every hour. You can, of course, specify any (or all) of the environment variables detailed above in the environment file if you wish.
That completes my setup - so hopefully I don't encounter any more network-related issues that lock me out of accessing my lab machine remotely! To install it yourself, you can do this:
# Create the directory for the script to live in sudo mkdir /usr/local/lib/ensure-network # Download the script & service file sudo curl -L -O /usr/local/lib/ensure-network/ensure-network.sh https://gist.githubusercontent.com/sbrl/08e13f2ceedafe35ac7f8dbdfb8bfde7/raw/cc5ab4226472c08b09e448a257256936cc749193/ensure-network.sh sudo curl -L -O /etc/systemd/system/ensure-network.service https://gist.githubusercontent.com/sbrl/08e13f2ceedafe35ac7f8dbdfb8bfde7/raw/adf5ed4009b3e1a09f857936fceb3581897072f4/ensure-network.service # Start the service & enable it on boot sudo systemctl daemon-reload sudo systemctl start ensure-network.service sudo systemctl enable ensure-network.service
You might need to replace the URLs there with the latest ones that download the raw content from the GitHub Gist.
Did you find this useful? Got a suggestion to make it better? Running into issues? Comment below!