Note to self: Don't reboot the server at midnight....
You may (or may not) have noticed a small window of ~3/4 hour the other day when my website was offline. I thought I'd post about the problem, the solution, and what I'll try to avoid next time.
The problem occurred when I was about to head to bed late at night. I decided to quickly reboot the server to reboot into a new kernel to activate some security updates.
I have this habit of leaving a
ping -O hostname running in a separate terminal to monitor the progress of the reboot. I'm glad I did so this time, as I noticed that it took a while to go down for rebooting. Then it took an unusually long time to come up again, and when it did, I couldn't SSH in again!
After a quick check, the website was down too - so it was time to do something about it and fast. Thankfully, I already knew what was wrong - it was just a case of fixing it.....
In a Linux system, there's a file called
/etc/fstab that defines all the file systems that are to be mounted. While this sounds a bit counter-intuitive (since how does it know to mount the filesystem that the file itself described how to mount?), it's built into the initial ramdisk (also this) if I understand it correctly.
There are many different types of file system in Linux. Common ones include
ext4 (the latest Linux filesystem),
nfs (Network FileSystem),
sshfs (for mounting remote filesystems over SSH),
davfs (WebDav shares), and more.
Problems start to arise when some of the filesystems defined in
/etc/fstab don't mount correctly. WebDav filesystems are notorious for this, I've found - so they generally need to have the
noauto flag attached, like this:
https://dav.bobsrockets.com/path/to/directory /path/to/mount/point davfs noauto,user,rw,uid=1000,gid=1000 0 0
Unfortunately, I forgot to do this with the webdav filesystem I added a few weeks ago, causing the whole problem in the first place.
The unfortunate issue was that since it couldn't mount the filesystems, jt couldn't start the SSH server. If it couldn't start the SSH server, I couldn't get in to fix it!
Kimsufi rescue mode to the, erm rescue! It turned out that my provider, KimSufi, have a rescue mode system built-in for just this sort of occasion. At the click of a few buttons, I could reboot my server into a temporary rescue environment with a random SSH password.
Therein I could mount the OS file system, edit
/etc/fstab, and reboot into normal mode. Sorted!
Just a note for future reference: I recommend using the rescuepro rescue mode OS, and not either of the FreeBSD options. I had issues trying to mount the OS disk with them - I kept getting an
Invalid argumennt error. I was probably doing something wrong, but at the time I didn't really want to waste tones of time trying to figure that out in an unfamiliar OS.
Hopefully there isn't a next time. I'm certainly going to avoid
auto webdav mounts, instead spawning a subprocess to mount them in the background after booting is complete.
I'm also going to avoid rebooting my server when I don't have time to deal with anyn potential fallout....