Starbeamrainbowlabs

Stardust
Blog

Automatically downloading emails and extracting their attachments

I have an all-in-one printer that's also a scanner - specifically the Epson Ecotank 4750 (though annoyingly the automated document feeder doesn't support duplex). While it's a great printer (very eco-friendly, and the inks last for ages!), my biggest frustration with it is that it doesn't scan directly to an SMB file share (i.e. a Windows file share). It does support SANE though, which allows you to use it through a computer.

This is ok, but the ability to scan directly from the device itself without needing to use a computer was very convenient, so I set out to remedy this. The printer does have a cloud feature they call "Epson Connect", which allows one to upload to various cloud services such as Google Drive and Box, but I don't want to upload potentially sensitive data to such services.

Fortunately, there's a solution at hand - email! The printer in question also supports scanning to a an email address. Once the scanning process is complete, then it sends an email to the preconfigured email address with the scanned page(s) attached. It's been far too long since my last post about email too, so let's do something about that.

Logging in to my email account just to pick up a scan is clunky and annoying though, so I decided to automate the process to resolve the issue. The plan is as follows:

  1. Obtain a fresh email address
  2. Use IMAP IDLE to instantly download emails
  3. Extract attachments and save them to the output directory
  4. Discard the email - both locally and remotely

As some readers may be aware, I run my own email server - hence the reason why I wrote this post about email previously, so I reconfigured it to add a new email address. Many other free providers exist out there too - just make sure you don't use an account you might want to use for anything else, since our script will eat any emails sent to it.

Steps 2, 3, and 4 there took some research and fiddling about, but in the end I cooked up a shell script solution that uses fetchmail, procmail (which is apparently unmaintained, so I should consider looking for alternatives), inotifywait, and munpack. I've also packaged it into a Docker container, which I'll talk about later in this post.

To illustrate how all of these fit together, let's use a diagram:

A diagram showing how the whole process fits together - explanation below.

fetchmail uses IMAP IDLE to hold a connection open to the email server. When it receives notification of a new email, it instantly downloads it and spawns a new instance of procmail to handle it.

procmail writes the email to a temporary directory structure, which a separate script is watching with inotifywait. As soon as procmail finishes writing the new email to disk, inotifywait triggers and the email is unpacked with munpack. Any attachments found are moved to the output directory, and the original email discarded.

With this in mind, let's start drafting up a script. The first order of the day is configuring fetchmail. This is done using a .fetchmailrc file - I came up with this:

poll bobsrockets.com protocol IMAP port 993
    user "user@bobsrockets.com" with pass "PASSWORD_HERE"
    idle
    ssl

...where user@bobsrockets.com is the email address you want to watch, bobsrockets.com is the domain part of said email address (everything after the @), and PASSWORD_HERE is the password required to login.

Save this somewhere safe with tight file permissions for later.

The other configuration file we'll need is one for procmail. let's do that one now:

CORRECTHOME=/tmp/maildir
MAILDIR=$CORRECTHOME/

:0
Mail/

Replace /tmp/maildir with the temporary directory you want to use to hold emails in. Save this as procmail.conf for later too.

Now we have the mail config files written, we need to install some software. I'm using apt on Debian (a minideb Docker container actually), so you'll need to adapt this for your own system if required.

sudo apt install ca-certificates fetchmail procmail inotify-tools mpack
# or, if you're using minideb:
install_packages ca-certificates fetchmail procmail inotify-tools mpack

fetchmail is for some strange reason extremely picky about the user account it runs under, so let's update the pre-created fetchmail user account to make it happy:

groupadd --gid 10000 fetchmail
usermod --uid 10000 --gid 10000 --home=/srv/fetchmail --uid=10000 --gi=10000 fetchmail
chown fetchmail:fetchmail /srv/fetchmail

fetchmail now needs that config file we created earlier. Let's update the permissions on that:

chmod 10000:10000 path/to/.fetchmailrc

If you're running on bare metal, move it to the /srv/fetchmail directory now. If you're using Docker, keep reading, as I recommend that this file is mounted using a Docker volume to make the resulting container image more reusable.

Now let's start drafting a shell script to pull everything together. Let's start with some initial setup:

#!/usr/bin/env bash

if [[ -z "${TARGET_UID}" ]]; then
    echo "Error: The TARGET_UID environment variable was not specified.";
    exit 1;
fi
if [[ -z "${TARGET_GID}" ]]; then
    echo "Error: The TARGET_GID environment variable was not specified.";
    exit 1;
fi
if [[ "${EUID}" -ne 0 ]]; then
    echo "Error: This Docker container must run as root because fetchmail is a pain, and to allow customisation of the target UID/GID (although all possible actions are run as non-root users)";
    exit 1;
fi

dir_mail_root="/tmp/maildir";
dir_newmail="${dir_mail_root}/Mail/new";
target_dir="/mnt/output";

fetchmail_uid="$(id -u "fetchmail")";
fetchmail_gid="$(id -g "fetchmail")";

temp_dir="$(mktemp --tmpdir -d "imap-download-XXXXXXX")";
on_exit() {
    rm -rf "${temp_dir}";
}
trap on_exit EXIT;

log_msg() {
    echo "$(date -u +"%Y-%m-%d %H:%M:%S") imap-download: $*";
}

This script will run as root, and fetchmail runs as UID 10000 and GID 10000, The reasons for this are complicated (and mostly have to do with my weird network setup). We look for the TARGET_UID and TARGET_GID environment variables, as these define the uid:gid we'll be setting files to before writing them to the output directory.

We also determine the fetchmail UID/GID dynamically here, and create a second temporary directory to work with too (the reasons for which will become apparent).

Before we continue, we need to create the directory procmail writes new emails to. Not because procmail won't create it on its own (because it will), but because we need it to exist up-front so we can watch it with inotifywait:

mkdir -p "${dir_newmail}";
chown -R "${fetchmail_uid}:${fetchmail_gid}" "${dir_mail_root}";

We're running as root, but we'll want to spawn fetchmail (and other things) as non-root users. Technically, I don't think you're supposed to use sudo in non-interactive scripts, and it's also not present in my Docker container image. The alternative is the setpriv command, but using it is rather complicated and annoying.

It's more powerful than sudo, as it allows you to specify not only the UID/GID a process runs as, but also the capabilities the process will have too (e.g. binding to low port numbers). There's a nasty bug one has to work around if one is using Docker too, so given all this I've written a wrapper function that abstracts all of this complexity away:

# Runs a process as another user.
# Ref https://github.com/SinusBot/docker/pull/40
# $1    The UID to run the process as.
# $2    The GID to run the process as.
# $3-*  The command (including arguments) to run
run_as_user() {
    run_as_uid="${1}"; shift;
    run_as_gid="${1}"; shift;
    if [[ -z "${run_as_uid}" ]]; then
        echo "run_as_user: No target UID specified.";
        return 1;
    fi
    if [[ -z "${run_as_gid}" ]]; then
        echo "run_as_user: No target GID specified.";
        return 2;
    fi

    # Ref https://github.com/SinusBot/docker/pull/40
    # WORKAROUND for `setpriv: libcap-ng is too old for "all" caps`, previously "-all" was used here
    # create a list to drop all capabilities supported by current kernel
    cap_prefix="-cap_";
    caps="$cap_prefix$(seq -s ",$cap_prefix" 0 "$(cat /proc/sys/kernel/cap_last_cap)")";

    setpriv --inh-caps="${caps}" --reuid "${run_as_uid}" --clear-groups --regid "${run_as_gid}" "$@";
    return "$?";
}

With this in hand, we can now wrap fetchmail and procmail in a function too:

do_fetchmail() {
    log_msg "Starting fetchmail";

    while :; do
        run_as_user "${fetchmail_uid}" "${fetchmail_gid}" fetchmail --mda "/usr/bin/procmail -m /srv/procmail.conf";

        exit_code="$?";
        if [[ "$exit_code" -eq 127 ]]; then
            log_msg "setpriv failed, exiting with code 127";
            exit 127;
        fi 

        log_msg "Fetchmail exited with code ${exit_code}, sleeping 60 seconds";
        sleep 60
    done
}

In short this spawns fetchmail as the fetchmail user we configured above, and also restarts it if it dies. If setpriv fails, it returns an exit code of 127 - so we catch that and don't bother trying again, as the issue likely needs manual intervention.

To finish the script, we now need to setup that inotifywait loop I mentioned earlier. Let's setup a shell function for that:


do_attachments() {
    while :; do # : = infinite loop
        # Wait for an update
        # inotifywait's non-0 exit code forces an exit for some reason :-/
        inotifywait -qr --event create --format '%:e %f' "${dir_newmail}";

        # Process new email here
    done
}

Processing new emails is not particularly difficult, but requires a sub loop because:

while read -r filename; do

    # Process each email

done < <(find "${dir_newmail}" -type f);

Finally, we need to process each email we find in turn. Let's outline the steps we need to take:

  1. Move the email to that second temporary directory we created above (since the procmail directory might not be empty)
  2. Unpack the attachments
  3. chown the attach

Let's do this in chunks. First, let's move it to the temporary directory:

log_msg "Processing email ${filename}";

# Move the email to a temporary directory for processing
mv "${filename}" "${temp_dir}";

The filename environment variable there is the absolute path to the email in question, since we used find and passed it an absolute directory to list the contents of (as opposed to a relative path).

To find the filepath we moved it to, we need to do this:

filepath_temp="${temp_dir}/$(basename "${filename}")"

This is important for the next step, where we unpack it:

# Unpack the attachments
munpack -C "${temp_dir}" "${filepath_temp}";

Now that we've unpacked it, let's do a bit of cleaning up, by deleting the original email file and the .desc description files that munpack also generates:

# Delete the original email file and any description files
rm "${filepath_temp}";
find "${temp_dir}" -iname '*.desc' -delete;

Great! Now we have the attachments sorted, now all we need to do is chown them to the target UID/GID and move them to the right place.

chown -R "${TARGET_UID}:${TARGET_GID}" "${temp_dir}";
chmod -R a=rX,ug+w "${temp_dir}";

I also chmod the temporary directory too to make sure that the permissions are correct, because otherwise the mv command is unable to read the directory's contents.

Now to actually move all the attachments:

# Move the attachment files to the output directory
while read -r attachment; do
    log_msg "Extracted attachment ${attachment}";
    chmod 0775 "${attachment}";
    run_as_user "${TARGET_UID}" "${TARGET_GID}" mv "${attachment}" "${target_dir}";
done < <(find "${temp_dir}" -type f);

This is rather overcomplicated because of an older design, but it does the job just fine.

With that done, we've finished the script. I'll include the whole script at the bottom of this post.

Dockerification

If you're running on bare metal, then you can skip to the end of this post. Because I have a cluster, I want to be able to run this thereon. Since said cluster works with Docker containers, it's natural to Dockerise this process.

The Dockerfile for all this is surprisingly concise:

(Can't see the above? View it on my personal Git server instead)

To use this, you'll need the following files alongside it:

It exposes the following Docker volumes:

All these files can be found in this directory on my personal Git server.

Conclusion

We've strung together a bunch of different programs to automatically download emails and extract their attachments. This is very useful as for ingesting all sorts of different files. Things I haven't covered:

As always, this blog post is both a reference for my own use and a starting point for you if you'd like to do this for yourself.

If you've found this useful, please comment below! I find it really inspiring / motivating to learn how people have found my posts useful and what for.

Sources and further reading

run.sh script

(Can't see the above? Try a this link, or alternatively this one (bash))

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression containerisation css dailyprogrammer data analysis debugging demystification distributed computing docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro network networking nibriboard node.js operating systems own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference releases rendering resource review rust searching secrets security series list server software sorting source code control statistics storage svg talks technical terminal textures thoughts three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

Archive

Art by Mythdael