Starbeamrainbowlabs

Stardust
Blog


Archive


Mailing List Articles Atom Feed Comments Atom Feed Twitter Reddit Facebook

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blog bookmarklet booting bug hunting c sharp c++ challenge chrome os code codepen coding conundrums coding conundrums evolved command line compilers compiling compression css dailyprogrammer data analysis debugging demystification distributed computing documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro network networking nibriboard node.js operating systems own your code performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference releases resource review rust searching secrets security series list server software sorting source code control statistics storage svg talks technical terminal textures thoughts three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

Quick File Management with Gossa

Recently a family member needed to access some documents at a remote location that didn't support USB flash drives. Awkward to be sure, but I did some searching around and found a nice little solution that I thought I'd blog about here.

At first, I thought about setting up Filestash - but I discovered that only installation through Docker is officially supported (if it's written in Go, then shouldn't it end up as a single binary? What's Docker needed for?).

Docker might be great, but for a quick solution to an awkward issue I didn't really want to go to the trouble for installing Docker and figuring out all the awkward plumbing problems for the first time. It definitely appeared to me that it's better suited to a setup where you're already using Docker.

Anyway, I then discovered Gossa. It's also written in Go, and is basically a web interface that lets you upload, download, and rename files (click on a file or directory's icon to rename).

A screenshot of Gossa listing the contents of my CrossCode music folder. CrossCode is awesome, and you should totally go and play it - after finishing reading this post of course :P

Is it basic? Yep.

Do the icons look like something from 1995? Sure.

(Is that Times New Roman I spy? I hope not)

Does it do the job? Absolutely.

For what it is, it's solved my problem fabulously - and it's so easy to setup! First, I downloaded the binary from the latest release for my CPU architecture, and put it somewhere on disk:

curl -o gossa -L https://github.com/pldubouilh/gossa/releases/download/v0.0.8/gossa-linux-arm

chmod +x gossa
sudo chown root: gossa
sudo mv gossa /usr/local/bin/gossa;

Then, I created a systemd service file to launch Gossa with the right options:

[Unit]
Description=Gossa File Manager (syncthing)
After=syslog.target rsyslog.service network.target

[Service]
Type=simple
User=gossa
Group=gossa
WorkingDirectory=/path/to/dir
ExecStart=/usr/local/bin/gossa -h [::1] -p 5700 -prefix /gossa/ /path/to/directory/to/serve
Restart=always

StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=gossa


[Install]
WantedBy=multi-user.target

_(Top tip! Use systemctl cat service_name to quickly see the service file definition for any given service)_

Here I start Gossa listening on the IPv6 local loopback address on port 5700, set the prefix to /gossa/ (I'm going to be reverse-proxying it later on using a subdirectory of a pre-existing subdomain), and send the standard output & error to syslog. Speaking of which, we should tell syslog what to do with the logs we send it. I put this in /etc/rsyslog.d/gossa.conf:

if $programname == 'gossa' then /var/log/gossa/gossa.log
if $programname == 'gossa' then stop

After that, I configured log rotate by putting this into /etc/logrotate.d/gossa:

/var/log/gossa/*.log {
    daily
    missingok
    rotate 14
    compress
    delaycompress
    notifempty
    create 0640 root adm
    postrotate
        invoke-rc.d rsyslog rotate >/dev/null
    endscript
}

Very similar to the configuration I used for RhinoReminds, which I blogged about here.

Lastly, I configured Nginx on the machine I'm running this on to reverse-proxy to Gossa:

server {

    # ....

    location /gossa {
        proxy_pass http://[::1]:5700;
    }

    # ....

}

I've configured authentication elsewhere in my Nginx server block to protect my installation against unauthorised access (and oyu probably should too). All that's left to do is start Gossa and reload Nginx:

sudo systemctl daemon-reload
sudo systemctl start gossa
# Check that Gossa is running
sudo systemctl status gossa

# Test the Nginx configuration file changes before reloading it
sudo nginx -t
sudo systemctl reload

Note that reloading Nginx is more efficient that restarting it, since it doesn't kill the process - only reload the configuration from disk. It doesn't matter here, but in a production environment that receives a high volume of traffic you it's a great way make configuration changes while avoid dropping client connections.

In your web browser, you should see something like the image at the top of this post.

Found this interesting? Got another quick solution to an otherwise awkward issue? Comment below!

Setting up a Mosquitto MQTT server

I recently found myself setting up a mosquitto instance (yep, for this) due to a migration we're in the middle of doing and it got quite interesting, so I thought I'd post about it here. This post is also partly documentation of what I did and why, just in case future people come across it and wonder how it's setup, though I have tried to make it fairly self-documenting.

At first, I started by doing sudo apt install mosquitto and seeing if it would work. I can't remember if it did or not, but it certainly didn't after I played around with the configuration files. To this end, I decided that enough was enough and I turned the entire configuration upside-down. First up, I needed to disable the existing sysV init-based service that ships with the mosquitto package:

sudo systemctl stop mosquitto # Just in case
sudo systemctl start mosquitto

Next, I wrote a new systemd service file:

[Unit]

Description=Mosquitto MQTT Broker
After=syslog.target rsyslog.target network.target

[Service]
Type=simple
PIDFile=/var/run/mosquitto/mosquitto.pid
User=mosquitto

PermissionsStartOnly=true
ExecStartPre=-/bin/mkdir /run/mosquitto
ExecStartPre=/bin/chown -R mosquitto:mosquitto /run/mosquitto

ExecStart=/usr/sbin/mosquitto --config-file /etc/mosquitto/mosquitto.conf
ExecReload=/bin/kill -s HUP $MAINPID

StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=mosquitto


[Install]
WantedBy=multi-user.target

This is broadly similar to the service file I developed in my earlier tutorial post, but it's slightly more complicated.

For one, I use PermissionsStartOnly=true and a series of ExecStartPre directives to allow mosquitto to create a PID file in a directory in /run. /run is a special directory on Linux for PID files and other such things, but normally only root can modify it. mosquitto will be running under the mosquitto user (surprise surprise), so we need to create a subdirectory for it and chown it so that it has write permissions.

A PID file is just a regular file on disk that contains the PID (Process IDentifier) number of the primary process of a system service. System service managers such as systemd and OpenRC use this number to manage the health of the service while it's running and send it various signals (such as to ask it to reload its configuration file).

With this in place, I then added an rsyslog definition at /etc/rsyslog.d/mosquitto.conf to tell it where to put the log files:

if $programname == 'kraggwapple' then /var/log/mosquitto/mosquitto.log
if $programname == 'kraggwapple' then stop

Thinking about it, I should probably check that a log rotation definition file is also in place.

Just in case, I then chowned the pre-existing log files to ensure that rsyslog could read & write to it:

sudo chown -R syslog: /var/log/mosquitto

Then, I filled out /etc/mosquitto/mosquitto.conf with a few extra directives and restarted the service. Here's the full configuration file:

# Place your local configuration in /etc/mosquitto/conf.d/
#
# A full description of the configuration file is at
# /usr/share/doc/mosquitto/examples/mosquitto.conf.example

# NOTE: We can't use tab characters here, as mosquitto doesn't like it.

pid_file /run/mosquitto/mosquitto.pid

# Persistence configuration
persistence true
persistence_location /var/lib/mosquitto/


# Not a file today, thanks
# Log files will actually end up at /var/llog/mosquitto/mosquitto.log, but will go via syslog
# See /etc/rsyslog.d/mosquitto.conf
#log_dest file /var/log/mosquitto/mosquitto.log
log_dest syslog


include_dir /etc/mosquitto/conf.d


# Documentation: https://mosquitto.org/man/mosquitto-conf-5.html

# Require a username / password to connect
allow_anonymous false
# ....which are stored in the following file
password_file /etc/mosquitto/mosquitto_users

# Make a log entry when a client connects & disconnects, to aid debugging
connection_messages true

# TLS configuration
# Disabled at the moment, since we don't yet have a letsencrypt cert
# NOTE: I don't think that the sensors currently connect over TLS. We should probably fix this.
# TODO: Point these at letsencrypt
#cafile /etc/mosquitto/certs/ca.crt
#certfile /etc/mosquitto/certs/hostname.localdomain.crt
#keyfile /etc/mosquitto/certs/hostname.localdomain.key

As you can tell, I've still got some work to do here - namely the TLS setup. It's a bit of a chicken-and-egg problem, because I need the domain name to be pointing at the MQTT server in order to get a Let's Encrypt TLS certificate, but that'll break all the sensors using the current one..... I'm sure I'll figure it out.

But wait! We forgot the user accounts. Before I started the new service, I added some user accounts for client applications to connect with:

sudo mosquitto_passwd /etc/mosquitto/mosquitto_users username1
sudo mosquitto_passwd /etc/mosquitto/mosquitto_users username1

The mosquitto_passwd program prompts for a password - that way you don't end up with the passwords in your ~/.bash_history file.

With all that taken care of, I started the systemd service:

sudo systemctl daemon-reload
sudo systemctl start mosquitto-broker.service

Of course, I ended up doing a considerable amount of debugging in between all this - I've edited it down to make it more readable and fit better in a blog post :P

Lastly, because I'm paranoid, I double-checked that it was running with htop and netstat:


sudo netstat -peanut | grep -i mosquitto
tcp        0      0 0.0.0.0:1883            0.0.0.0:*               LISTEN      112        2676558    5246/mosquitto      
tcp        0      0 x.y.z.w:1883           x.y.z.w:54657       ESTABLISHED 112        2870033    1234/mosquitto      
tcp        0      0 x.y.z.w:1883           x.y.z.w:39365       ESTABLISHED 112        2987984    1234/mosquitto      
tcp        0      0 x.y.z.w:1883           x.y.z.w:58428       ESTABLISHED 112        2999427    1234/mosquitto      
tcp6       0      0 :::1883                 :::*                    LISTEN      112        2676559    1234/mosquitto      

...no idea why it want to connect to itself, but hey! Whatever floats its boat.

Monitoring HTTP server response time with collectd and a bit of bash

In the spirit of the last few posts I've been making here (A and B), I'd like to talk a bit about collectd, which I use to monitor the status of my infrastructure. Currently this consists of the server you've connected to in order to view this webpage, and a Raspberry Pi that acts as a home file server.

I realised recently that monitoring the various services that I run (such as my personal git server for instance) would be a good idea, as I'd rather like to know when they go down or act abnormally.

As a first step towards this, I decided to configure my existing collectd setup to monitor the response time of the HTTP endpoints of these services. Later on, I can then configure some alerts to message me when something goes down.

My first thought was to check the plugin list to see if there was one that would do the trick. As you might have guessed by the title of this post, however, such an easy solution would be too uninteresting and not worthy of writing a blog post.

Since such a plugin doesn't (yet?) exist, I turned to the exec plugin instead.

In short, it lets you write a program that writes to the standard output in the collectd plain text protocol, which collectd then interprets and adds to whichever data storage backend you have configured.

Since shebangs are a thing on Linux, I could technically choose any language I have an interpreter installed for, but to keep things (relatively) simple, I chose Bash, the language your local terminal probably speaks (unless it speaks zsh or fish instead).

My priorities were to write a script that is:

  1. Easy to reconfigure
  2. Ultra lightweight

Bash supports associative arrays, so I can cover point #1 pretty easily like this:

declare -A targets=(
    ["main_website"]="https://starbeamrainbowlabs.com/"
    ["git"]="https://git.starbeamrainbowlabs.com/"
    # .....
)

Excellent! Covering point #2 will be an on-going process that I'll need to keep in mind as I write this script. I found this GitHub repository a while back, which has served as a great reference point in the past. Here's hoping it'll be useful this time too!

It's important to note the structure of the script that we're trying to write. Collectd exec scripts have 2 main environment variables we need to take notice of:

  • COLLECTD_HOSTNAME - The hostname of the local machine
  • COLLECTD_INTERVAL - Interval at which we should collect data. Defined in collectd.conf.

The script should write to the standard output the values we've collected in the collectd plain text format every COLLECTD_INTERVAL. Collectd will automatically ensure that only 1 instance of our script is running at once, and will also automatically restart it if it crashes.

To run a command regularly at a set interval, we probably want a while loop like this:

while :; do
    # Do our stuff here

    sleep "${COLLECTD_INTERVAL}";
done

This is a great start, but it isn't really compliant with objective #2 we defined above. sleep is actually a separate command that spawns a new process. That's an expensive operation, since it has to allocate memory for a new stack and create a new entry in the process table.

We can avoid this by abusing the read command timeout, like this:

# Pure-bash alternative to sleep.
# Source: https://blog.dhampir.no/content/sleeping-without-a-subprocess-in-bash-and-how-to-sleep-forever
snore() {
    local IFS;
    [[ -n "${_snore_fd:-}" ]] || exec {_snore_fd}<> <(:);
    read ${1:+-t "$1"} -u $_snore_fd || :;
}

Thanks to bolt for this.

Next, we need to iterate over the array of targets we defined above. We can do that with a for loop:

while :; do
    for target in "${!targets[@]}"; do
        check_target "${target}" "${targets[${target}]}"
    done

    snore "${COLLECTD_INTERVAL}";
done

Here we call a function check_target that will contain our main measurement logic. We've changed sleep to snore too - our new subprocess-less sleep alternative.

Note that we're calling check_target for each target one at a time. This is important for 2 reasons:

  • We don't want to potentially skew the results by taking multiple measurements at once (e.g. if we want to measure multiple PHP applications that sit in the same process poll, or measure more applications than we have CPUs)
  • It actually spawns a subprocess for each function invocation if we push them into the background with the & operator. As I've explained above, we want to try and avoid this to keep it lightweight.

Next, we need to figure out how to do the measuring. I'm going to do this with curl. First though, we need to setup the function and bring in the arguments:

# $1 - target name
# $2 - url
check_target() {
    local target_name="${1}"
    local url="${2}";

    # ......
}

Excellent. Now, let's use curl to do the measurement itself:

curl -sS --user-agent "${user_agent}" -o /dev/null --max-time 5 -w "%{http_code}\n%{time_total}" "${url}"

This looks complicated (and it probably is to some extent), but let's break it down with the help of explainshell.

Part Meaning
-sS Squashes all output except for errors and the bits we want. Great for scripts like ours.
--user-agent Specifies the user agent string to use when making a request. All good internet citizens should specify a descriptive one (more on this later).
-o /dev/null We're not interested in the content we download, so this sends it straight to the bin.
--max-time 5 This sets a timeout of 5 seconds for the whole operation - after which curl will throw an error and return with exit code 28.
-w "%{http_code}\n%{time_total}" This allows us to pull out metadata about the request we're interested in. There's actually a whole range available, but for now I'm interested in how long it took and the response code returned
"${url}" Specifies the URL to send the request to. curl does actually support making more than 1 request at once, but utilising this functionality is out-of-scope for now (and we'd get skewed results because it re-uses connections - which is normally really helpful & performance boosting)

To parse the output we get from curl, I found the readarray command after going a bit array mad at the beginning of this post. It pulls every line of input into a new slot in an array for us - and since we can control the delimiter between values with curl, it's perfect for parsing the output. Let's hook that up now:

readarray -t result < <(curl -sS --user-agent "${user_agent}" -o /dev/null --max-time 5 -w "%{http_code}\n%{time_total}" "${url}");

The weird command < <(another_command); syntax is process substitution. It's a bit like the another_command | command syntax, but a bit different. We need it here because readarray parses the values into a new array variable in the current context, and if we use the a | b syntax here, we instantly lose access to the variable it creates because a subprocess is spawned (and readarray is a bash builtin) - hence the weird process substitution.

Now that we've got the output from curl parsed and ready to go, we need to handle failures next. This is a little on the nasty side, as by default bash won't give us the non-zero exit code from substituted processes. Hence, we need to tweak our already long arcane incantation a bit more:

readarray -t result < <(curl -sS --user-agent "${user_agent}" -o /dev/null --max-time 5 -w "%{http_code}\n%{time_total}\n" "${url}"; echo "${PIPESTATUS[*]}");

Thanks to this answer on StackOverflow for ${PIPESTATUS}. Now, we have array called result with 3 elements in it:

Index Value
0 The HTTP response code
1 The time taken in seconds
2 The exit code of curl

With this information, we can now detect errors and abort continuing if we detect one. We know there was an error if any of the following occur:

  • curl returned a non-zero exit code
  • The HTTP response code isn't 2XX or 3XX

Let's implement that in bash:

if [[ "${result[2]}" -ne 0 ]] || [[ "${result[0]}" -lt "200" ]] || [[ "${result[0]}" -gt "399" ]]; then
    return
fi

Again, let's break it down:

  • [[ "${result[2]}" -ne 0 ]] - Detect a non-zero exit code from curl
  • [[ "${result[0]}" -lt "200" ]] - Detect if the HTTP response code is less than 200
  • [[ "${result[0]}" -gt "399" ]] - Detect if the HTTP response code is greater than 399

In the future, we probably want to output a notification here of some sort instead of just simply silently returning, but for now it's fine.

Finally, we can now output the result in the right format for collectd to consume. Collectd operates on identifiers, values, and intervals. A bit of head-scratching and documentation reading later, and I determined the correct identifier format for the task. I wanted to have all the readings on the same graph so I could compare the different response times (just like the ping plugin does), so we want something like this:

bobsrockets.com/http_services/response_time-TARGET_NAME`

....where we replace bobsrockets.com with ${COLLECTD_HOSTNAME}, and TARGET_NAME with the name of the target we're measuring (${target_name} from above).

We can do this like so:

echo "PUTVAL \"${COLLECTD_HOSTNAME}/http_services/response_time-${target_name}\" interval=${COLLECTD_I
NTERVAL} N:${result[1]}";

Here's an example of it in action:

PUTVAL "/http_services/response_time-git" interval=300.000 N:0.118283
PUTVAL "/http_services/response_time-main_website" interval=300.000 N:0.112073

It does seem to run through the items in the array in a rather strange order, but so long as it does iterate the whole lot, I don't really care.

I'll include the full script at the bottom of this post, so all that's left to do is to point collectd at our new script like this in /etc/collectd.conf:

LoadPlugin  exec

# .....

<Plugin exec>
    Exec    "nobody:nogroup"        "/etc/collectd/http_response_times.sh"  "measure"
</Plugin>

I've added measure as an argument there for future-proofing, as it looks like we may have to run a separate instance of the script for sending notifications if I understand the documentation correctly (I need to do some research.....).

Very cool. It's taken a few clever tricks, but we've managed to write an efficient script for measuring http response times. We've made it more efficient by exploiting read timeouts and other such things. While we won't gain a huge amount of speed from this (bash is pretty lightweight already - this script is weighing in at just ~3.64MiB of private RAM O.o), it will all add up over time - especially considering how often this will be running.

In the future, I'll definitely want to take a look at implementing some alerts to notify me if a service is down - but that will be a separate post, as this one is getting quite long :P

Found this interesting? Got another way of doing this? Curious about something? Comment below!


Full Script

#!/usr/bin/env bash
set -o pipefail;

# Variables:
#   COLLECTD_INTERVAL   Interval at which to collect data
#   COLLECTD_HOSTNAME   The hostname of the local machine

declare -A targets=(
    ["main_website"]="https://starbeamrainbowlabs.com/"
    ["webmail"]="https://mail.starbeamrainbowlabs.com/"
    ["git"]="https://git.starbeamrainbowlabs.com/"
    ["nextcloud"]="https://nextcloud.starbeamrainbowlabs.com/"
)
# These are only done once, so external commands are ok
version="0.1+$(date +%Y%m%d -r $(readlink -f "${0}"))";

user_agent="HttpResponseTimeMeasurer/${version} (Collectd Exec Plugin; $(uname -sm)) bash/${BASH_VERSION} curl/$(curl --version | head -n1 | cut -f2 -d' ')";

# echo "${user_agent}"

###############################################################################

# Pure-bash alternative to sleep.
# Source: https://blog.dhampir.no/content/sleeping-without-a-subprocess-in-bash-and-how-to-sleep-forever
snore() {
    local IFS;
    [[ -n "${_snore_fd:-}" ]] || exec {_snore_fd}<> <(:);
    read ${1:+-t "$1"} -u $_snore_fd || :;
}

# Source: https://github.com/dylanaraps/pure-bash-bible#split-a-string-on-a-delimiter
split() {
    # Usage: split "string" "delimiter"
    IFS=$'\n' read -d "" -ra arr <<< "${1//$2/$'\n'}"
    printf '%s\n' "${arr[@]}"
}

# Source: https://github.com/dylanaraps/pure-bash-bible#get-the-number-of-lines-in-a-file
# Altered to operate on the standard input.
count_lines() {
    # Usage: lines <"file"
    mapfile -tn 0 lines
    printf '%s\n' "${#lines[@]}"
}

###############################################################################

# $1 - target name
# $2 - url
check_target() {
    local target_name="${1}"
    local url="${2}";

    readarray -t result < <(curl -sS --user-agent "${user_agent}" -o /dev/null --max-time 5 -w "%{http_code}\n%{time_total}\n" "${url}"; echo "${PIPESTATUS[*]}");

    # 0 - http response code
    # 1 - time taken
    # 2 - curl exit code

    # Make sure the exit code is non-zero - this includes if curl hits a timeout error
    # Also ensure that the HTTP response code is valid - any 2xx or 3xx response code is ok
    if [[ "${result[2]}" -ne 0 ]] || [[ "${result[0]}" -lt "200" ]] || [[ "${result[0]}" -gt "399" ]]; then
        return
    fi

    echo "PUTVAL \"${COLLECTD_HOSTNAME}/http_services/response_time-${target_name}\" interval=${COLLECTD_INTERVAL} N:${result[1]}";
}

while :; do
    for target in "${!targets[@]}"; do
        # NOTE: We don't use concurrency here because that spawns additional subprocesses, which we want to try & avoid. Even though it looks slower, it's actually more efficient (and we don't potentially skew the results by measuring multiple things at once)
        check_target "${target}" "${targets[${target}]}"
    done

    snore "${COLLECTD_INTERVAL}";
done

Own your Code, Part 1: Git Hosting - How did we get here?

Somewhat recently, I posted about how I fixed a nasty problem with an lftp upload. I mentioned that I'd been setting up continuous deployment for an application that I've been writing.

There's actually quite a bit of a story behind how I got to that point, so I thought I'd post about it here. Starting with code hosting, I'm going to show how I setup my own private git server, followed by Laminar (which, I might add, is not for everyone. It's actually quite involved), and finally I'll take a look at continuous deployment.

The intention is to do so in a manner that enables you to do something similar for yourself too (If you have any questions along the way, comment below!).

Of course, this is far too much to stuff into a single blog post - so I'll be splitting it up into a little bit of a mini-series.

Personally, I use git for practically all the code I write, so it makes sense for me to use services such as GitLab and GitHub for hosting these in a public place so that others can find them.

This is all very well, but I do find that I've acquired a number of private projects (say, for University work) that I can't / don't want to open-source. In addition, I'd feel a lot better if I had a backup mirror of the important code repositories I host on 3rd party sites - just in case.

This is where hosting one's own git server comes into play. I've actually blogged about this before, but since then I've moved from Go Git Service to Gitea, a fork of Gogs though a (rather painful; also this) migration.

This post will be more of a commentary on how I went about it, whilst giving some direction on how to do it for yourself. Every server is very different, which makes giving concrete instructions challenging. In addition, I ended up with a seriously non-standard install procedure - which I can't recommend! I need to get around to straightening a few things out at some point.....

So without further hesitation, let's setup Gitea as our Git server! To do so, we'll need an Nginx web server setup already. If you haven't, try following this guide and then come back here.

DNS

Next, you'll need to point a new subdomain at your server that's going to be hosting your Git server. If you've already got a domain name pointed at it (e.g. with A / AAAA records), I can recommend using a CNAME record that points at this pre-existing domain name.

For example, if I have a pair of records for control.bobsrockets.com:

A       control.bobsrockets.com.    1.2.3.4
AAAA    control.bobsrockets.com.    2001::1234:5678

...I could create a symlink like this:

CNAME   git.bobsrockets.com         control.bobsrockets.com.

(Note: For the curious, this isn't actually official DNS record syntax. It's just pseudo-code I invented on-the-fly)

Installation

With that in place, the next order of business is actually installing Gitea. This is relatively simple, but a bit of a pain - because native packages (e.g. sudo apt install ....) aren't a thing yet.

Instead, you download a release binary from the releases page. Once done, we can do some setup to get all our ducks in a row. When setting it up myself, I ended up with a rather weird configuration - as I actually started with a Go Git Service instance before Gitea was a thing (and ended up going through a rather painful) - so you should follow their guide and have a 'normal' installation :P

Once done, you should have Gitea installed and the right directory structure setup.

A note here is that if you're like me and you have SSH running on a non-standard port, you've got 2 choices. Firstly, you can alter the SSH_PORT directive in the configuration file (which should be called app.ini) to match that of your SSH server.

If you decide that you want it to run it's own inbuilt SSH server on port 22 (or any port below 1024), what the guide doesn't tell you is that you need to explicitly give the gitea binary permission to listen on a privileged port. This is done like so:

setcap 'cap_net_bind_service=+ep' gitea

Note that every time you update Gitea, you'll have to re-run that command - so it's probably a good idea to store it in a shell script that you can re-execute at will.

At this point it might also be worth looking through the config file (app.ini I mentioned earlier). There's a great cheat sheet that details the settings that can be customised - some may be essential to configuring Gitea correctly for your environment and use-case.

Updates

Updates to Gitea are, of course, important. GitHub provides an Atom Feed that you can use to keep up-to-date with the latest releases.

Later on this series, we'll take a look at how we can automate the process by taking advantage of cron, Laminar CI, and fpm - amongst other tools. I haven't actually done this yet as of the time of typing and we've got a looong way to go until we get to that point - so it's a fair ways off.

Service please!

We've got Gitea installed and we've considered updates, so the natural next step is to configure it as a system service.

I've actually blogged about this process before, so if you're interested in the details, I recommend going and reading that article.

This is the service file I use:

[Unit]
Description=Gitea
After=syslog.target
After=rsyslog.service
After=network.target
#After=mysqld.service
#After=postgresql.service
#After=memcached.service
#After=redis.service

[Service]
# Modify these two values and uncomment them if you have
# repos with lots of files and get an HTTP error 500 because
# of that
###
#LimitMEMLOCK=infinity
#LimitNOFILE=65535
Type=simple
User=git
Group=git
WorkingDirectory=/srv/git/gitea
ExecStart=/srv/git/gitea/gitea web
Restart=always
Environment=USER=git HOME=/srv/git

[Install]
WantedBy=multi-user.target

I believe I took it from here when I migrated from Gogs to Gitea. Save this as /etc/systemd/system/gitea.service, and then do this:

sudo systemctl daemon-reload
sudo systemctl start gitea.service

This should start Gitea as a system service.

Wiring it up

The next step now that we've got Gitea running is to reverse-proxy it with Nginx that we set up earlier.

Create a new file at /etc/nginx/conf.d/2-git.conf, and paste in something like this (not forgetting to customise it to your own use-case):

server {
    listen  80;
    listen  [::]:80;

    server_name git.starbeamrainbowlabs.com;
    return 301 https://$host$request_uri;
}

upstream gitea {
    server  [::1]:3000;
    keepalive 4; # Keep 4 connections open as a cache
}   

server {
    listen  443 ssl http2;
    listen  [::]:443 ssl http2;

    server_name git.starbeamrainbowlabs.com;
    ssl_certificate     /etc/letsencrypt/live/git.starbeamrainbowlabs.com-0001/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/git.starbeamrainbowlabs.com-0001/privkey.pem;

    add_header strict-transport-security    "max-age=31536000;";
    add_header access-control-allow-origin  https://nextcloud.starbeamrainbowlabs.com   always;
    add_header content-security-policy      "frame-ancestors http://*.starbeamrainbowlabs.com";

    #index  index.html index.php;
    #root   /srv/www;

    location / {
        proxy_pass          http://gitea;

        #proxy_set_header   x-proxy-server      nginx;
        #proxy_set_header   host                $host;
        #proxy_set_header   x-originating-ip    $remote_addr;
        #proxy_set_header   x-forwarded-for     $remote_addr;

        proxy_hide_header   X-Frame-Options;
    }

    location ~ /.well-known {
        root    /srv/letsencrypt;
    }

    #include /etc/nginx/snippets/letsencrypt.conf;

    #location = / {
    #   proxy_pass          http://127.0.0.1:3000;
    #   proxy_set_header    x-proxy-server      nginx;
    #   proxy_set_header    host                $host;
    #   proxy_set_header    x-originating-ip    $remote_addr;
    #   proxy_set_header    x-forwarded-for     $remote_addr;
    #}

    #location = /favicon.ico {
    #   alias /srv/www/favicon.ico;
    #}
}

You may have to comment out the listen 443 blocks and put in a listen 80 temporarily whilst configuring letsencrypt.

Then, reload Nginx: sudo systemctl reload nginx

Conclusion

Phew! We've looked at installing and setting up Gitea behind Nginx, and using a systemd service to automate the management of Gitea.

I've also talked a bit about how I set my own Gitea instance up and why.

In future posts, I'm going to talk about Continuous Integration, and how I setup Laminar CI. I'll also talk about alternatives for those who want something that comes with a few more batteries included.... :P

Found this interesting? Got stuck and need help? Spotted a mistake? Comment below!

Note to self: Don't reboot the server at midnight....

You may (or may not) have noticed a small window of ~3/4 hour the other day when my website was offline. I thought I'd post about the problem, the solution, and what I'll try to avoid next time.

The problem occurred when I was about to head to bed late at night. I decided to quickly reboot the server to reboot into a new kernel to activate some security updates.

I have this habit of leaving a ping -O hostname running in a separate terminal to monitor the progress of the reboot. I'm glad I did so this time, as I noticed that it took a while to go down for rebooting. Then it took an unusually long time to come up again, and when it did, I couldn't SSH in again!

After a quick check, the website was down too - so it was time to do something about it and fast. Thankfully, I already knew what was wrong - it was just a case of fixing it.....

In a Linux system, there's a file called /etc/fstab that defines all the file systems that are to be mounted. While this sounds a bit counter-intuitive (since how does it know to mount the filesystem that the file itself described how to mount?), it's built into the initial ramdisk (also this) if I understand it correctly.

There are many different types of file system in Linux. Common ones include ext4 (the latest Linux filesystem), nfs (Network FileSystem), sshfs (for mounting remote filesystems over SSH), davfs (WebDav shares), and more.

Problems start to arise when some of the filesystems defined in /etc/fstab don't mount correctly. WebDav filesystems are notorious for this, I've found - so they generally need to have the noauto flag attached, like this:

https://dav.bobsrockets.com/path/to/directory   /path/to/mount/point    davfs   noauto,user,rw,uid=1000,gid=1000    0   0

Unfortunately, I forgot to do this with the webdav filesystem I added a few weeks ago, causing the whole problem in the first place.

The unfortunate issue was that since it couldn't mount the filesystems, jt couldn't start the SSH server. If it couldn't start the SSH server, I couldn't get in to fix it!

Kimsufi rescue mode to the, erm rescue! It turned out that my provider, KimSufi, have a rescue mode system built-in for just this sort of occasion. At the click of a few buttons, I could reboot my server into a temporary rescue environment with a random SSH password.

Therein I could mount the OS file system, edit /etc/fstab, and reboot into normal mode. Sorted!

Just a note for future reference: I recommend using the rescuepro rescue mode OS, and not either of the FreeBSD options. I had issues trying to mount the OS disk with them - I kept getting an Invalid argumennt error. I was probably doing something wrong, but at the time I didn't really want to waste tones of time trying to figure that out in an unfamiliar OS.

Hopefully there isn't a next time. I'm certainly going to avoid auto webdav mounts, instead spawning a subprocess to mount them in the background after booting is complete.

I'm also going to avoid rebooting my server when I don't have time to deal with anyn potential fallout....

Backing up to AWS S3 with duplicity

The server that this website runs on backs up automatically to the Simple Storage Service, provided by Amazon Web Services. Such an arrangement is actually fairly cheap - only ~20p/month! I realised recently that although I've blogged about duplicity before (where I discussed using an external hard drive), I never covered how I fully automate the process here on starbeamrainbowlabs.com.

A bunch of hard drives. (Above: A bunch of hard drives. The original can be found here.)

It's fairly similar in structure to the way it works backing up to an external hard drive - just with a few different components here and there, as the script that drives this is actually older than the one that backs up to an external hard drive.

To start, we'll need an AWS S3 bucket. I'm not going to cover how to do this here, as the AWS interface keeps changing, and this guide will likely become outdated quickly. Instead, the AWS S3 documentation has an official guide on how to create one. Make sure it's private, as you don't want anyone getting a hold of your backups!

With that done, you should have both an access key and a secret. Note these down in a file called .backup-password in a new directory that will hold the backup script like this:

#!/usr/bin/env bash
PASSPHRASE="INSERT_RANDOM_PASSWORD_HERE";
AWS_ACCESS_KEY_ID="INSERT_AWS_ACCESS_KEY_HERE";
AWS_SECRET_ACCESS_KEY="INSERT_AWS_SECRET_KEY_HERE";

The PASSPHRASE here should be a long and unintelligible string of random characters, and will encrypt your backups. Note that down somewhere safe too - preferably in your password manager or somewhere else at least as secure.

If you're on Linux, you should also set the permissions on the .backup-password file to ensure nobody gets access to it who shouldn't. Here's how I did it:

sudo chown root:root .backup-password
sudo chmod 0400 .backup-password

This ensures that only the root user is able to read the file - and nobody can write to it. With our secrets generated and safely stored, we can start writing the backup script itself. Let's start by reading in the secrets:

#!/usr/bin/env bash
source /root/.backup-password

I stored my .backup-password file in /root. Next, let's export these values. This enables the subprocesses we invoke to access these environment variables:

export PASSPHRASE;
export AWS_ACCESS_KEY_ID;
export AWS_SECRET_ACCESS_KEY;

Now it's time to do the backup itself! Here's what I do:

duplicity \
    --full-if-older-than 2M \
    --exclude /proc \
    --exclude /sys \
    --exclude /tmp \
    --exclude /dev \
    --exclude /mnt \
    --exclude /var/cache \
    --exclude /var/tmp \
    --exclude /var/backups \
    --exclude /srv/www-mail/rainloop/v \
    --s3-use-new-style --s3-european-buckets --s3-use-ia \
    / s3://s3-eu-west-1.amazonaws.com/INSERT_BUCKET_NAME_HERE

Compressed version:

duplicity --full-if-older-than 2M --exclude /proc --exclude /sys --exclude /tmp --exclude /dev --exclude /mnt --exclude /var/cache --exclude /var/tmp --exclude /var/backups --exclude /srv/www-mail/rainloop/v --s3-use-new-style --s3-european-buckets --s3-use-ia / s3://s3-eu-west-1.amazonaws.com/INSERT_BUCKET_NAME_HERE

This might look long and complicated, but it's mainly due to the large number of directories that I'm excluding from the backup. The key options here are --full-if-older-than 2M and --s3-use-ia, which specify I want a full backup to be done every 2 months and to use the infrequent access pricing tier to reduce costs.

The other important bit here is to replace INSERT_BUCKET_NAME_HERE with the name of the S3 bucket that you created.

Backing is all very well, but we want to remove old backups too - in order to avoid ridiculous bills (AWS are terrible for this - there's no way that you can set a hard spending limit! O.o). That's fairly easy to do:

duplicity remove-older-than 4M \
    --force \
    --s3-use-new-style --s3-european-buckets --s3-use-ia \
    s3://s3-eu-west-1.amazonaws.com/INSERT_BUCKET_NAME_HERE

Again, don't forget to replace INSERT_BUCKET_NAME_HERE with the name of your S3 bucket. Here, I specify I want all backups older than 4 months (the 4M bit) to be deleted.

It's worth noting here that it may not actually be able to remove backups older than 4 months here, as it can only delete a full backup if there are not incremental backups that depend on it. To this end, you'll need to plan for potentially storing (and being charged for) an extra backup cycle's worth of data. In my case, that's an extra 2 months worth of data.

That's the backup part of the script complete. If you want, you could finish up here and have a fully-working backup script. Personally, I want to know how much data is in my S3 bucket - so that I can get an idea as to how much I'll be charged when the bill comes through - and also so that I can see if anything's going wrong.

Unfortunately, this is a bit fiddly. Basically, we have to utilise the AWS command-line interface to recursively list the entire contents of our S3 bucket in summarising mode in order to get it to tell us what we want to know. Here's how to do that:

aws s3 ls s3://INSERT_BUCKET_BAME_HERE --recursive --human-readable --summarize

Don't forget to replace INSERT_BUCKET_BAME_HERE wiith your bucket's name. The output from this is somewhat verbose, so I ended up writing an awk script to process it and output something nicer. Said awk script looks like this:

/^\s*Total\s+Objects/ { parts[i++] = $3 }
/^\s*Total\s+Size/ { parts[i++] = $3; parts[i++] = $4; }
END {
    print(
        "AWS S3 Bucket Status:",
        parts[0], "objects, totalling "
        parts[1], parts[2]
    );
}

If we put all that together, it should look something like this:

aws s3 ls s3://INSERT_BUCKET_BAME_HERE --recursive --human-readable --summarize | awk '/^\s*Total\s+Objects/ { parts[i++] = $3 } /^\s*Total\s+Size/ { parts[i++] = $3; parts[i++] = $4; } END { print("AWS S3 Bucket Status:", parts[0], "objects, totalling " parts[1], parts[2]); }'

...it's a bit of a mess. Perhaps I should look at putting that awk script in a separate file :P Anyway, here's some example output:

AWS S3 Bucket Status: 602 objects, totalling 21.0 GiB Very nice indeed. To finish off, I'd rather like to know how long it took to do all this. Thankfully, bash has an inbuilt automatic variable that holds the number of seconds since the current process has started, so it's just a case of parsing this out into something readable:

echo "Done in $(($SECONDS / 3600))h $((($SECONDS / 60) % 60))m $(($SECONDS % 60))s.";

...I forget which Stackoverflow answer it was that showed this off, but if you know - please comment below and I'll update this to add credit. This should output something like this:

Done in 0h 12m 51s.

Awesome! We've now got a script that backs up to AWS S3, deletes old backups, and tells us both how much space on S3 is being used and how long the whole process took.

I'm including the entire script at the bottom of this post. I've changed it slightly to add a single variable for the bucket name - so there's only 1 place on line 9 (highlighted) you need to update there.

(Above: A Geopattern, tiled using the GNU Image Manipulation Program)


#!/usr/bin/env bash

# Make sure duplicity exists
test -x $(which duplicity) || exit 1;

# Pull in the password
. /root/.backup-password

AWS_S3_BUCKET_NAME="INSERT_BUCKET_NAME_HERE";

# Allow duplicity to access it
export PASSPHRASE;
export AWS_ACCESS_KEY_ID;
export AWS_SECRET_ACCESS_KEY;

# Actually do the backup
# Backup strategy:
# 1 x backup per week:
#   1 x full backup per 2 months
#   incremental backups in between
# S3 Bucket URI: https://${AWS_S3_BUCKET_NAME}/
echo [ $(date +%F%r) ] Performing backup.
duplicity --full-if-older-than 2M --exclude /proc --exclude /sys --exclude /tmp --exclude /dev --exclude /mnt --exclude /var/cache --exclude /var/tmp --exclude /var/backups --exclude /srv/www-mail/rainloop/v --s3-use-new-style --s3-european-buckets --s3-use-ia / s3://s3-eu-west-1.amazonaws.com/${AWS_S3_BUCKET_NAME}

# Remove old backups
# You have to plan for 1 extra full backup cycle when
# calculating space requirements - duplicity only
# removes a backup if it won't invalidate those further
# along the chain - the oldest backup will always be
# a full one.
echo [ $(date +%F%r) ] Backup complete. Removing old volumes.
duplicity remove-older-than 4M --force --encrypt-key F2A6D8B6 --s3-use-new-style --s3-european-buckets --s3-use-ia s3://s3-eu-west-1.amazonaws.com/${AWS_S3_BUCKET_NAME}
echo [ $(date +%F%r) ] Cleanup complete.

aws s3 ls s3://${AWS_S3_BUCKET_NAME} --recursive --human-readable --summarize | awk '/^\s*Total\s+Objects/ { parts[i++] = $3 } /^\s*Total\s+Size/ { parts[i++] = $3; parts[i++] = $4; } END { print("AWS S3 Bucket Status:", parts[0], "objects, totalling " parts[1], parts[2]); }'

echo "Done in $(($SECONDS / 3600))h $((($SECONDS / 60) % 60))m $(($SECONDS % 60))s.";

Creating a system service with systemd

While I've got some grumblings with systemd over how it handles (or not) certain things, it's the most popular service manager on Linux systems today. By this, I mean it starts and stops the various services (like your SSH server, web server, cron) automatically, according to the rules laid out in special service files.

Since it's so popular and I keep having to write services (and look up how to do so every time), I thought I'd write a post here about it to save me the trouble :P

Bear in mind that systemd isn't the only service manager out there. Others include OpenRC, runit, upstart, and more! If you're using one of these (I'm looking to investigate using one of these on my next server rebuild), then this tutorial isn't for you. I will probably be releasing a tutorial down the road for OpenRC though, if I get around to having a server running an OS that uses it.

systemd stores it's service files in /etc/systemd/system/, so to start we need to create a new file in there:

sudo sensible-editor /etc/systemd/system/service_name.service

With the new file open in your favourite editor, it's time to set out our service definition. This is done with an ini-like syntax. Here's an example:

[Unit]
Description=Gitea
After=syslog.target rsyslog.service network.target

[Service]
Type=simple
User=git
Group=git
WorkingDirectory=/srv/git/gitea
ExecStart=/srv/git/gitea/gitea web
Restart=always
Environment=USER=git HOME=/srv/git

[Install]
WantedBy=multi-user.target

The above is a simple service file for Gitea, which is the engine behind my personal git server. Let's go through each section one by one.

Firstly, the [Unit] section defines the metadata about the service. It's fairly self explanatory actually - we set the description of the service here, and also the other services (space ` separated) that we want our service to be started after with theAfter` property.

Next comes the [Service] section. This section specifies how it should start the service. We tell it that it's a simple service (in other words it doesn't do anything fancy - other types are available, but we won't use them here), the user and group it should run under, and working directory of the process, and the command (and it's arguments) to execute in order to start the process.

In addition, we also tell it to restart the service if it crashes, and set a few environment variables to refine the way Gitea behaves. Very cool!

The final section, [Install], simply specifies the systemd-equivalent of which run-level this service should start on. It's very interesting from a how-does-my-system-work perspective - I recommend reading this Stack Exchange answer and this article for more information - it's a topic for another post here on this blog :-)

To start this new service, do the following:

sudo systemctl daemon-reload
sudo systemctl enable service_name.service
sudo systemctl start service_name.service

This starts our new service and configures it to automatically start when the system first boots.

With that taken care of, we've now got the basics down of our very own service file! We can take this further though. What if there's a secret key that we need to pass to a service on startup in an environment variable, but we don't want to specify it in the service because it's world-readable?

The answer here is a clever bit of shell scripting. Consider the following service file:

[Unit]
Description=Awesome XMPP Bot
After=network.target prosody.service

[Service]
Type=simple
User=bot
WorkingDirectory=/srv/bot
ExecStart=/srv/bot/start_service.sh
Restart=on-failure
# Other Restart options: or always, on-abort, etc

[Install]
WantedBy=multi-user.target

In this case, we've defined a service file for an XMPP bot (public server directory). In order for it to connect to an XMPP server, it needs a JID (a username - formatted like an email address) and password. However, we don't want to specify these directly in the service file because they are secret!

Instead, we've specified that it should start a shell script that's located at /srv/bot/start_service.sh instead of the bot itself. Here's the contents of that shell script:

#!/usr/bin/env bash

source .xmpp_credentials

export XMPP_JID;
export XMPP_PASSWORD;

exec /usr/bin/mono Bot.exe

This simple shell script loads the contents of the file .xmpp_credentials, specifies that the XMPP_JID and XMPP_PASSWORD environment variables should be passed to any further (sub) processes, and asks for the current process to be terminated and replaced with an instance of Mono executing our bot's code that stored in Bot.exe (this way we don't have an extra bash process sitting around doing nothing, since it's job is done as soon as we start the bot itself).

In this way, we can store our precious private details in a file that we can lock down so that only the bot's user account can read it. Here's what that .xmpp_credentials file might look like:

#!/usr/bin/env bash
XMPP_JID="bot@bobsrockets.com";
XMPP_PASSWORD="sekret";

....and if I run ls .xmpp_credentials, I might see something like this:

-r-x------ 1 bot bot 104 Nov 10 21:27 .xmpp_credentials

Here the file permissions allow only the bot user to read and execute the file, but not modify it (sudo chown bot:bot .xmpp_credentials and sudo chmod 0500 .xmpp_credentials set these permissions for the curious).

These completes the tutorial on setting up services with systemd. We've seen how to create service files and make them start on boot (much easier than alternatives like running a command manually or using screen!). We've also learnt a simple way to hide credentials (though more advanced alternatives do exist).

Found this useful? Found a better way to do it? Comment below!

Sources and Further Reading

Proxies: What's the difference?

You've probably heard of proxies. Perhaps you used one when you were at school to access a website you weren't supposed to. But did you know that there are multiple different types of proxies that are used for different things? For example, a reverse proxy perform load-balancing and caching for your web application? And that a transparent proxy can be used to filter the traffic of your internet connection without you knowing (well, almost)? In this post, I'll be explaining the difference between the different types of proxy I'm aware of, why you'd want one, and how to detect their presence.

Reverse Proxies

A reverse proxy is one that, when it receives a request, repeats it to an upstream server. For example, I use nginx to reverse-proxy PHP requests to a backend PHP-FPM instance.

A diagram showing how a reverse proxy works. Basically: Client -> nginx (the reverse proxy) -> PHP-FPM (the server behind the reverse proxy).

Reverse proxies also come in really handy if you want to run multiple, perhaps unrelated, servers on a single machine with a single IP address, as they can reverse proxy requests to the right place based on the requested subdomain. For example, on my server I not only serve my website (which in and of itself reverse-proxies PHP requests), but also serves my git server - which is a separate process listening on a different port behind my firewall.

Caching is another key feature of reverse proxies that comes in dead useful if you're running a medium-high traffic website. Instead of forwarding every single request to your backend for processing, if you've got a blog, for instance, you could cache the responses to requests for the posts themselves and serve them directly from the reverse proxy, leaving the slower backend free to process comments that people make, for example. Both nginx and Varnish have support for this. This with method, it's possible to serve 1000s of requests a minute from a very modestly sized virtual machine (say, 512MB RAM, 1 CPU) if configured correctly. Take that, Apache!

Finally, when 1 server isn't enough any more, your can get reverse proxies like nginx to act as a load balancer. In this scenario, there are multiple backend servers (probably running on different machines, with a fast internal LAN connecting them all), and a single front-facing load balancer sitting in front of them all distributing requests to the backend servers. nginx in particular can get very fancy with the logic here, should you need that kind of control. It can even monitor the health of the backend application servers, and avoid sending any requests to unresponsive servers - giving them time to recover from a crash.

A diagram visualing the load-balancer explained above. A single nginx instance faces the internet, with multiple app servers behind it that it proxies requests to.

Forward Proxies

Forward proxies are distinctly different to reverse proxies, in that they make requests to the destination client wants to connect to on their behalf. Such a proxy can be instituted for many reasons. Sometimes, it's for security reasons - for example to ensure that all those connecting to a backend local network are authenticated (authentication with a forward proxy is done via a set of special Proxy- HTTP headers). Other times, it's to preserve data on limited and/or expensive internet connections.

More often though, it's to censor and surveil the internet connection of the users on a network - and also to bypass such censoring. It is in this manner that HTTP(S) has become so pervasive - in that companies, institutions, (and, in rare cases), Internet Service Providers install forward proxies to censor the connections of their users - as such proxies usually only understand HTTP and HTTPS (clients request that a forward proxy retrieve something for them via a GET https://bobsrockets.net/ HTTP/1.1 request for example). If you're curious though, some forward proxies these days support the CONNECT HTTP method, allowing one to set up a TLS connection with another server (whether that be an HTTPS, SSH, SMTPS, or other protocol server). In addition, the SOCKS protocol now allows for arbitrary TCP connection to be proxied through as well.

Forward proxies nearly always require some client-side configuration. If you've wondered what the proxy settings are in your operating system and web browser's settings - this is what they're for.

Such can usually by identified by the Via and other headers that they attach to outgoing requests, as per RFC 2616. Online tools exist that exploit this - allowing you to detect whether such a proxy exists.

Transparent Proxies

Transparent proxies are similar to forward proxies, but do not require any client-side configuration. Instead, they utilise clever networking tricks to intercept network traffic being sent to and from the clients on a network. In this manner, they can cache responses, filter content, and protect the users from attacks without the client necessarily being aware of their existence.

It is important to note here though that utilising a proxy is by no means a substitute for maintaining proper defences on your own computer, such as installing and configuring a firewall, ensuring your system has all the latest updates, and, if you're running windows, ensuring you have an antivirus program installing and running (Windows 10 comes with one automatically these days).

Even though they don't usually attach the Via header (as they are supposed to), such proxies can usually be detected by cleverly designed tests that exploit their tendency to cache requests, thankfully.

Conclusion

So there you have it. We've taken a look at Forward proxies, and the benefits (and drawbacks) they can provide to users. We've also investigated Transparent proxies, and how to detect them. Finally, we've looked at Reverse proxies and the advantages they can provide to enable you to scale and structure your next great web (and other protocol! Nginx supports all sorts of other protocols besides HTTP(S)) application.

Maintenance: Server Push Support!

Recently, I took the time to add the official nginx ppa to my server to keep nginx up-to-date. In doing do, I jumped from a security-path-backported nginx 1.10 to version 1.14..... which adds a bunch of very cool new features. As soon as I leant that HTTP/2 Server Push was among the new features to be supported, I knew that I had to try it out.

In short, Server Push is a new technology - part of HTTP/2.0 (it's here at last :D) - that allows you to send resources to the client before they even know they need them. This is done by enabling it in the web server, and then having the web application append a specially-formatted link header to outgoing requests - which tell the web server what resources it bundle along with the response.

First, let's enable it in nginx. This is really quite simple:

http {
    # ....

    http2_push_preload      on;

    # ....
}

This enables link header parsing serve-wide. If you want to enable it for just a single virtual host, the http2_push_preload directive can be placed inside server blocks too.

With support enabled in nginx, we can add support to our web application (in my case, this website!). If you do a HEAD request against a page on my website, you'll get a response looking like this:

HTTP/2 200 
server: nginx/1.14.0
date: Tue, 21 Aug 2018 12:35:02 GMT
content-type: text/html; charset=UTF-8
vary: Accept-Encoding
x-powered-by: PHP/7.2.9-1+ubuntu16.04.1+deb.sury.org+1
link: </theme/core.min.css>; rel=preload; as=style, </theme/main.min.css>; rel=preload; as=style, </theme/comments.min.css>; rel=preload; as=style, </theme/bit.min.css>; rel=preload; as=style, </libraries/prism.min.css>; rel=preload; as=style, </theme/tagcloud.min.css>; rel=preload; as=style, </theme/openiconic/open-iconic.min.css>; rel=preload; as=style, </javascript/bit.min.js>; rel=preload; as=script, </javascript/accessibility.min.js>; rel=preload; as=script, </javascript/prism.min.js>; rel=preload; as=script, </javascript/smoothscroll.min.js>; rel=preload; as=script, </javascript/SnoozeSquad.min.js>; rel=preload; as=script
strict-transport-security: max-age=31536000;
x-xss-protection: 1; mode=block
x-frame-options: sameorigin

Particularly of note here is the link header. it looks long and complicated, but that's just because I'm pushing multiple resources down. Let's pull it apart. In essence, the link header takes a comma (,) separated list of paths to resources that the web-server should push to the client, along with the type of each. For example, if https://bobsrockets.com/ wanted to push down the CSS stylesheet /theme/boosters.css, they would include a link header like this:

link: </theme/boosters.css>; rel=preload; as=style

It's also important to note here that pushing a resource doesn't mean that we don't have to utilise it somewhere in the page. By this I mean that pushing a stylesheet down as above still means that we need to add the appropriate <link /> element to put it to use:

<link rel="stylesheet" href="/theme/boosters.css" />

Scripts can be sent down too. Doing so is very similar:

link: </js/liftoff.js>; rel=preload; as=script

There are other as values as well. You can send all kinds of things:

  • script - Javascript files
  • style - CSS Stylesheets
  • image - Images
  • font - Fonts
  • document - <iframe /> content
  • audio - Sound files to be played via the HTML5 <audio /> element
  • worker - Web workers
  • video - Videos to be played via the HTML5 <video /> element.

The full list can be found here. If you don't have support in your web server yet (or can't modify HTTP headers) for whatever reason, never fear! There's still something you can do. HTML also supports a similar <link rel="preload" href="...." /> element that you can add to your document's <head>.

While this won't cause your server to bundle extra resources with a response, it'll still tell the client to go off and fetch the specified resources in the background with a high priority. Obviously, this won't help much with external stylesheets and scripts (since simply being present in the document is enough to get the client to request them), but it could still be useful if you're lazily loading images, for example.

In future projects, I'll certainly be looking out for opportunities to take advantage of HTTP/2.0 Server Push (probably starting with investigating options for Pepperminty Wiki). I've found the difference to be pretty extraordinary myself.

Of course, this is hardly the only feature that HTTP/2 brings. If there's the demand, I may blog about other features and how they work too.

Found this interesting? Confused about something? Using this yourself in a cool project? Comment below!

How to set up a shared PDF printer on your local network

I've recently ended up setting up a PDF printer on my local network in an effort to transfer some pictures out of a ridiculous i-device (I tell you, Apple'e iOS is the worst for being a walled garden). Since the process for doing so wasn't entirely obvious, I'm documenting it in this blog post to remind myself for later. If you find it useful, please let me know in the comments below!

Firstly, you'll need a machine running Linux. Any distribution will do, but I'll be using an apt-based distribution, so you may need to alter some of the commands here to suit your system.

Firstly, we need install the cups (which stands for the Common Unix Printing Service) PDF printer driver. It comes with a lot of junk if you're not careful, so here I use --no-install-recommends to avoid installing any unnecessary packages.

sudo apt install printer-driver-cups-pdf --no-install-recommends

If you've got a firewall running (which you really should - see this post of mine for more information on that), then you'll need to open the port 631 for TCP traffic to allow people to print. If you're using ufw, then this should do the trick:

sudo ufw allow cups

If not, then you may need to specify the port number explicitly:

sudo ufw allow 631/tcp

With the printer installed, we next need to open it to the world. Before that though, we should make some changes to the configuration file, which is located at /etc/cups-pdf.conf. Firstly, I wanted to put the resulting PDFs into my file server's shared folder. This is achieved by editing the Out and AnonDirName settings. They should already be present in the configuration file - it's just a matter of changing their values:

Out         /absolute/path/to/output/dir
AnonDirName /absolute/path/to/output/dir

I also wanted to customise the user account and permissions that it saves the pdfs with. I did this through the AnonUser and AnonUMask settings - which should also be present by default:

AnonUser    username
AnonUMask   0007

The umask is basically an inverted permission octal. I found a good calculator calculator online to do it for me :P (Don't forget the preceding 0 - it's important!)

Finally, I experienced an issue whereby cups kept overwriting the same file again and again because the iPad wasn't smart enough to send the photos to print with their actual filenames - instead opting to send them all as Photo.pdf. Thankfully though, cups-pdf has the Label option (also specified by default) that ensures that output filenames don't clash. Setting it to 1 instead of 0 solved the problem for me:

Label       1

Note that some of these properties may be prefixed with a hash (#). You'll need to remove this in order for it to take effect.

With the new PDF printer configured, it's time to open it up to our local network. Here's how to do that:

sudo cupsctl --share-printers
sudo lpadmin -p pdf -o printer-is-shared=true

Note that if you want to open it up to more than your local subnet you'll need to do some additional configuration - such as configuring authentication, for instance. Such things are beyond the scope of this blog post, but if there's the demand (comment below!) I can certainly investigate writing something up.

Found this useful? Got a better / different solution? Comment below!

Art by Mythdael