Archive

Tag Cloud

3d account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio bash batch blog bookmarklet booting c sharp c++ challenge chrome os code codepen coding conundrums coding conundrums evolved command line compilers compiling compression css dailyprogrammer debugging demystification distributed computing documentation downtime electronics email embedded systems encryption es6 features event experiment external first impressions future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet io.js jabber jam javascript js bin labs learning library linux low level lua maintenance manjaro network networking nibriboard node.js operating systems performance photos php pixelbot portable privacy problem solving programming problems projects prolog protocol protocols pseudo 3d python reddit redis reference release releases resource review rust searching secrets security series list server software sorting source code control statistics storage svg technical terminal textures three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

Job Scheduling on Linux

Scheduling jobs to happen at a later time on a Linux based machine can be somewhat confusing. Confused by 5 4 8-10/4 6/4 * baffled by 5 */4 * * *? All will be revealed!

cron

Scheduling jobs on a Linux machine can be done in several ways. Let's start with cron - the primary program that orchestrates the whole proceeding. Its name comes from the Greek word Chronos, which means time. By filling in a crontab (read cron-table), you can tell it what to do when. It's essentially a time-table of jobs you'd like it to run.

Your Linux machine should come with cron installed already. You can check if cron is installed and running by entering this command into your terminal:

if [[ "$(pgrep -c cron)" -gt 0 ]]; then echo "Cron is installed :D"; else echo "Cron is not installed :-("; fi If it isn't installed or running, then you'll have to investigate why this isn't the case. The most common is that it isn't installed. It's normally in the official repositories for most distributions - on Debian-based system sudo apt install cron should suffice. Arch-based users may need to check to make sure that the system service is enabled and do so manually. With cron setup and ready to go, we can start adding jobs to it. This is done by way of a crontab, as explained above. Each user has their own crontab such that they can each configure their own individual sets jobs. To edit it, type this: crontab -e This will open your favourite editor with your crontab ready for editing (if you'd like to change your editor, do sudo update-alternatives --config editor or change the EDITOR environment variable). You should see a bunch of lines like this: # Edit this file to introduce tasks to be run by cron. # # Each task to run has to be defined through a single line # indicating with different fields when the task will be run # and what command to run for the task # # To define the time you can provide concrete values for # minute (m), hour (h), day of month (dom), month (mon), # and day of week (dow) or use '*' in these fields (for 'any').# # Notice that tasks will be started based on the cron's system # daemon's notion of time and timezones. # # Output of the crontab jobs (including errors) is sent through # email to the user the crontab file belongs to (unless redirected). # # For example, you can run a backup of all your user accounts # at 5 a.m every week with: # 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/ # # For more information see the manual pages of crontab(5) and cron(8) # # m h dom mon dow command I'd advise you keep this for future reference - just in case you find yourself in a pinch later - so scroll down to the bottom and start adding your jobs there. Let's look at the syntax for telling cron about a job next. This is best done by example: 0 1 * * 7 cd /root && /root/run-backup This job, as you might have guessed, runs a custom backup script. It's one I wrote myself, but that's a story for another time (comment below if you'd like me to post about that). What we're interested in is the bit at the beginning: 0 1 * * 7. Scheduling a cron job is done by specifying 5 space-separated values. In the case of the above, the job will run at 1am every Sunday morning. The order is as follows: • Minute • Hour • Day of the Month • Month • Day of the week For of these values, a number of different specifiers can be used. For example, specifying an asterisk (*) will cause the job to run at every interval of that column - e.g. every minute or every hour. If you want to run something on every minute of the day (such as a logging or monitoring script), use * * * * *. Be aware of the system resources you can use up by doing that though! Specifying number will restrict it to a specific time in an interval. For example, 10 * * * * will run the job at 10 minutes past every hour, and 22 3 * * * will run a job at 03:22 in the morning every day (I find such times great for maintenance jobs). Sometimes, every hour or every minute is too often. Cron can handle this too! For example 3 */2 * * * will run a job at 3 minutes past every second hour. You can alter this at your leisure: The value after the forward slash (/) decides the interval (i.e. */3 would be every third, */15 would be every 15th, etc.). The last column, the day of the week, is an alternative to the day of the month column. It lets you specify, as you may assume, the day oft he week a job should run on. This can be specified in 2 way: With the numbers 0-6, or with 3-letter short codes such as MON or SAT. For example, 6 20 * * WED runs at 6 minutes past 8 in the evening on Wednesday, and 0 */4 * * 0 runs every 4th hour on a Sunday. The combinations are endless! Since it can be a bit confusing combining all the options to get what you want, crontab.guru is great for piecing cron-job specifications together. It describes your cron-job spec in plain English for you as you type! (Above: crontab.guru displaying a random cronjob spec) What if I turn my computer off? Ok, so cron is all very well, but what if you turn your machine off? Well, if cron isn't running at the time a job should be run, then it won't get executed. For those of us who don't leave their laptops on all the time, all is not lost! It's time to introduce the second piece of software at our disposal. Enter stage left: anacron. Built to be a complement to cron, anacron sets up 3 folders: • /etc/cron.daily • /etc/cron.weekly • /etc/cron.monthly Any executable scripts in this folder will be run at daily, weekly, and monthly intervals respectively by anacron, and it respects the hash-bang (that #! line at the beginning of the script) too! Most server systems do not come with anacron pre-installed, though it should be present if your distributions official repositories. Once you've installed it, edit root's crontab (with sudo crontab -e if you can't remember how) and add a job that executes anacron every hour like so: # Run anacron every hour 5 * * * * /usr/sbin/anacron This is important, as anacron does not in itself run all the time like cron does (this behaviour is called a daemon in the Linux world) - it needs a helping hand to get it to run. If you've got more specific requirements, then anacron also has it's own configuration file you can edit. It's found at /etc/anacrontab, and has a different syntax. In the anacron table, jobs follow the following pattern: • period - The interval, in days, that the job should run • delay - The offset, in minutes, that the job should run at • job identifier - A textual identifier (without spaces, of course) that identifies the job • command - The command that should be executed You'll notice that there are 3 jobs specified already - one for each of the 3 folders mentioned above. You can specify your own jobs too. Here's an example: # Do the weekly backup 7 20 run-backup cd /root/data-shape-backup && ./do-backup; The above job runs every 7 days, with an offset of 20 minutes. Note that I've included a command (the line starting with a hash #) to remind myself as to what the job does - I'd recommend you always include such a comment for your own reference - whether you're using cron, anacron, or otherwise. I'd also recommend that you test your anacron configuration file after editing it to ensure it's valid. This is done like so: anacron -T I'm not an administrator, can I still use this? Sure you can! If you've got anacron installed (you could even compile it from source locally if you haven't) and want to specify some jobs for your local account, then that's easily done too. Just create an anacrontab file anywhere you please, and then in your regular crontab (crontab -e), tell anacron where you put it like this: # Run anacron every hour 5 * * * * /usr/sbin/anacron -t "path/to/anacrontab" What about one-off jobs? Good point. cron and anacron are great for repeating jobs, but what if you want to set up a one-off job to auto-disable your firewall before enabling it just in case you accidentally lock yourself out? Thankfully, there's even an answer for this use-case too: atd. atd is similar to cron in that it runs a daemon in the background, but instead of executing jobs specified in a crontab, you tell it when you want it to execute a series of commands, and then enter the commands themselves. For example: $ at now + 10 minutes
warning: commands will be executed using /bin/sh
at> echo -e "Testing"
at> uptime
at> <EOT>
job 4 at Thu Jul 12 14:36:00 2018

In the above, I tell it to run the job 10 minutes from now, and enter a pair of commands. To end the command list, I hit CTRL + D on an empty line. The output of the job will be emailed to me automatically if I've got that set up (cron and anacron also do this).

Specifying a time can be somewhat fiddly, but its also quite flexible:

• at tomorrow
• at now + 5 hours
• at 16:06
• at next month
• at 2018 09 25

....and so on. Listing the current scheduled jobs is also just as easy:

atq

This will output a list of scheduled jobs that haven't been run yet. You can't see any jobs that aren't created by you unless you're root (use sudo), though. You can use the job ids listed here to cancel a job too:

# Remove job id 4:
atrm 4

Conclusion

That just about concludes this whirlwind tour of job scheduling on Linux systems. We've looked at how to schedule jobs with cron, and how to ensure our jobs get run - even when the target machine isn't turned on all the time with anacron. We've also looked at one-time jobs with atd, and how to manage the job queue.

As usual, this is a starting point - not an ending point! Job scheduling is just the beginning. From here, you can look at setting up automated backups. You could investigate setting up an email server, and how that integrates with cron. You can utilise cron to perform maintenance for your next great web (or other!) application. The possibilities are endless!

Found this useful? Still confused? Comment below!

Quest Get: Search large amounts of code!

(Above: A map of the Linux Kernel source code. Source: this post on medium.)

Recently I was working on a little project of mine (nope, not this for once! :P), and I needed a C♯ class I'd written a while ago. Being forgetful as I am, I had no idea which of my project I'd written it for. And so the quest began to find it! I did in the end, but it left me thinking whether there was a better way to search all my code quickly. This post is the culmination of everything I've discovered so far about the process of searching one's code.

Before I started, I already know about grep, which is built into almost every Linux system around. It's even available for Windows via the MSYS Tools. Unfortunately though, despite it's prevailance, it's not particularly good at searching large numbers of git repositories, as it keeps descending into the .git folder and displaying a whole load of useless results.

Something had to change. After asking reddit, I was introduced to OpenGrok. Written in Java, it indexes all of your code, and provides a web interface through which you can search it. Very nice. Unfortunately, I had trouble figuring out the logistics of actually getting it to run - and discovered that it takes multiple hours to set up correctly.

Moving on, I was re-introduced to ack, written in plain-old Perl, it apparently runs practically any system that Perl does - though it's not installed by default like grep is. Looking into it, I found it to be much like grep - only smarter. It ignores version control directories (like the .git folder ), and common package folders (like node_modules) by default, and even has a system by which results can be filtered by language (with support for hash-bangs too!). The results themselves are coloured by default - making it easy to skim through quickly. Coupled with the flexible configuration file system, ack makes for a wonderfully flexible way to search through large amounts of code quickly.

Though ack looks good, I still didn't have a way to search through all my code that scattered across multiple devices at once, so I kept looking. The next project I found (through alternative to actually) was Text Sherlock. It positions itself as an alternative to OpenGrok that's much simpler to configure.

True to its word, I managed to get a test instance set up running from my /tmp directory in 15 minutes - though it did take a while to index the code I had locally. It also took several seconds to consult its index when I entered a query. I suspect I could alleviate both of these issues by installing Xapian (an open-source high-performance search library), which it appears to have support for.

While the interface was cool, it didn't appear to allow me to tell it which directories not to index, so it ended trawling through all my .git directories - just like grep did. It also doesn't appear to multi-threaded - so it took much longer to index my code than it really needed to (I've got a solid-state drive and enough RAM for a few GBs of cache, so the indexing operation was CPU-bound, not I/O-bound).

In the end, I've rediscovered the awesome search tool ack, and taken a look at the current state of code search tools today. While I haven't yet found precisely what I'm looking for, I'm further forward than when I started.

Other honourable mentions include GNU Global (which apparently needs several GiBs per ~300MiB of source code for its generated static HTML web interface), insight.io (an IDE-like freemium cloud product that 'understands your code'), CodeQuery (only supports C, C++, Java, Python, Ruby, Javascript, and Go), and ripgrep (rust-based program, similar to ack and grep, feature comparison). The official ack website has a good page that contains more tools that are worth a look, too.

Got a cool way to search through all your code? Did this help you out? Comment below!

Securing a Linux Server Part 2: SSH

Wow, it's been a while since I posted something in this series! Last time, I took a look at the Uncomplicated Firewall, and how you can use it to control the traffic coming in (and going out) of your server. This time, I'm going to take a look at steps you can take to secure another vitally important part of most servers: SSH. Used by servers and their administrators across the world to talk to one another, if someone manages to get in who isn't supposed to, they could do all kinds of damage!

The first, and easiest thing we can do it improve security is to prevent the root user logging in. If you haven't done so already, you should create a new user on your server, set a good password, and give it superuser privileges. Login with the new user account, and then edit /etc/ssh/sshd_config, finding the line that says something like

PermitRootLogin yes

....and change it to

PermitRootLogin no

Once done, restart the ssh server. Your config might be slightly different (e.g. it might be PermitRootLogin without-password) - but the principle is the same. This adds an extra barrier to getting into your server, as now attackers must not only guess your password, but your username as well (some won't even bother, and keep trying to login to the root account :P).

Next, we can move SSH to a non-standard port. Some might argue that this isn't a good security measure to take and that it doesn't actually make your server more secure, but I find that it's still a good measure to take for 2 reasons: defence in depth, and preventing excessive CPU load from all the dumb bots that try to get in on the default port. With that, it's make another modification to /etc/ssh/sshd_config. Make sure you test at every step you take, as if you lock yourself out, you'll have a hard time getting back in again....

Port 22

Change 22 in the above to any other number between about 1 and 65535. Next, make sure you've allowed the new port through your firewall! If you're using ufw, my previous post (link above) gives a helpful guide on how to do this. Once done, restart your SSH server again - and try logging in before you close your current session. That way if you make a mistake, you can fix through your existing session.

Once you're confident that you've got it right, you can close port 22 on your firewall.

So we've created a new user account with a secure password (tip: use a password manager if you have trouble remembering it :-)), disabled root login, and moved the ssh port to another port number that's out of the way. Is there anything else we can do? Turns out there is.

Passwords are not the only we can authenticate against an SSH server. Public private keypairs can be used too - and are much more secure - and convenient - than passwords if used correctly. You can generate your own public-private keypair like so:

ssh-keygen -t ed25519

It will ask you a few questions, such as a password to encrypt the private key on disk, and where to save it. Once done, we need to tell ssh to use the new public-private keypair. This is fairly easy to do, actually (though it took me a while to figure out how!). Simply edit ~/.ssh/config (or create it if it doesn't exist), and create (or edit) an entry for your ssh server, making it look something like this:

Host bobsrockets.com
Port            {port_name}
IdentityFile    {path/to/private/keyfile}

It's the IdentityFile line that's important. The port line simply makes it such that you can type ssh bobsrockets.com (or whatever your server is called) and it will figure out the port number for you.

With a public-private keypair now in use, there's just one step left: disable password-based logins. I'd recommend trailing it for a while to make sure you haven't messed anything up - because once you disable it, if you lose your private key, you won't be getting back in again any time soon!

Again, open /etc/ssh/sshd_config for editing. Find the line that starts with PasswordAuthentication, and comment it out with a hash symbol (#), if it isn't already. Directly below that line, add PasswordAuthentication no.

Once done, restart ssh for a final time, and check it works. If it does, congratulations! You've successfully secured your SSH server (to the best of my knowledge, of course). Got a tip I haven't covered here? Found a mistake? Let me know in a comment below!

How to prevent php-fpm from overriding your PHP-based caching logic

A while ago I implemented ETag support to the dynamic preview generator in Pepperminty Wiki. While I thought it worked during testing, for some reason on a private instance of Pepperminty Wiki I discovered recently that my browser didn't seen to be taking advantage of it. This got me curious, so I decided to do a little bit of digging to find out why.

It didn't take long to find the problem. For some reason, all the responses from the server had a Cache-Control: no-cache, no-store, must-revalidate header attached to them. How strange! Even more so that it was in capital letters - my convention in Pepperminty Wiki is to always make the headers lowercase.

I checked the codebase with via the Project Find feature of Atom just to make sure that I hadn't left in a debugging statement or anything (I hadn't), and then I turned my attention to Nginx (engine-x) - the web server that I use on my server. Maybe it had added a caching header?

A quick grep later revealed that it wasn't responsible either - which leaves just one part of the system unchecked - php-fpm, the PHP FastCGI server that sits just behind Nginx that's responsible for running the various PHP scripts I have that power this website and other corners of my server. Another quick grep returned a whole bunch of garbage, so after doing some research I discovered that php-fpm, by default, is configured to send this header - and that it has to be disabled by editing your php.ini (for me it's in /etc/php/7.1/fpm/php.ini), and changing

;session.cache_limiter = nocache

to be explicitly set to an empty string, like so:

session.cache_limiter = ''

This appears to have solved by problem for now - allowing me to regain control over how the resources I send back via PHP are cached. Hopefully this saves someone else the hassle of pulling their entire web server stack apart and putting it back together again the future :P

Deep dive: Email, Trust, DKIM, SPF, and more

(Above: Lots of parcels. Hopefully you won't get this many through the door at once..... Source)

Now that I'm on holiday, I've got some time to write a few blog posts! As I've promised a few people a post on the email system, that's what I'll look at this this post. I'm going to take you on a deep dive through the email system and trust. We'll be journeying though the fields of DKIM signatures, and climb the SPF mountain. We'll also investigate why the internet needs to take this journey in the first place, and look at some of the challenges one faces when setting up their own mail server.

Hang on to your hats, ladies and gentlemen! If you get to the end, give yourself a virtual cookie :D

Before we start though, I'd like to mention that I'll be coming at this from the perspective of my own email server that I set up myself. Let me introduce to you the cast: Postfix (the SMTP MTA), Dovecot (the IMAP MDA), rspamd (the spam filter), and OpenDKIM (the thing that deals with DKIM signatures).

With that out of the way, let's begin! We'll start of our journey by mapping out the journey a typical email undertakes.

Let's say Bob Kerman wants to send Bill an email. Here's what happens:

1. Bill writes the email and hits send. His email client connects to his email server, logs in, and asks the server to deliver a message for him.
2. The server takes the email and reads the From header (in this case it's bill@billsboosters.com), figures out where the mail server is located, connects to it, and asks it to deliver Bob's message to Bill. mail.billsboosters.com takes the email and files it in Bill's inbox.
3. Bill connects to his mail server and retrieves Bob's message.

Of course, this is simplified in several places. mail.bobsrockets.com will obviously need to do a few DNS lookups to find billsboosters.com's mail server and fiddle with the headers of Bob's message a bit (such as adding a Received header etc.), and smtp.billsboosters.com won't just accept the message for delivery without checking out the server it came from first. How does it check though? What's preventing seanssatellites.net pretending to be bobsrockets.com and sending an imposter?

Until relatively recently, the answer was, well, nothing really. Anyone could send an email to anyone else without having to prove that they could indeed send email in the name of a domain. Try it out for yourself by telnetting to a mail server on port 25 (unencrypted SMTP) and trying in something like this:

HELO mail.bobsrockets.com
MAIL From: <frank@franksfuel.io>
RCPT TO <bill@billsboosters.com>
DATA
From: sean@seanssatellites.net
To: bill@billsboosters.com

Hello! This is a email to remind you.....
.
QUIT

Oh, my! Frank at franksfuel.io can connect to any mail server and pretend that sean@seanssatellites.net is sending a message to bill@billsboosters.com! Mail servers that allow this are called open relays, and today they usually find themselves on several blacklists within minutes. Ploys like these are easy to foil, thankfully (by only accepting mail for your own domains), but it still leaves the problem of what to do about random people connecting to your mail server delivering spam to your inbox that claims to be from someone they aren't supposed to be sending mail for.

In response, some mail servers demanded things like the IP that connects to send an email must reverse to the domain name that they want to send email from. Clever, but when you remember that anyone can change their own PTR records, you realise that it's just a minor annoyance to the determined spammer, and another hurdle to the legitimate person in setting up their own mail server!

Clearly, a better solution is needed. Time to introduce our first destination: SPF. SPF stands for sender policy framework, and defines a mechanism by which a mail server can determine which IP addresses a domain allows mail to be sent from in it's name. It's a TXT record that sites at the root of a domain. It looks something like this:

v=spf1 a mx ptr ip4:5.196.73.75 ip6:2001:41d0:e:74b::1 a:starbeamrainbowlabs.com a:mail.starbeamrainbowlabs.com -all

The above is my SPF TXT record for starbeamrainbowlabs.com. It's quite simple, really - let's break it down.

v=spf1

This just defines the version of the SPF standard. There's only one version so far, so we include this to state that this record is an SPF version 1 record.

a mx ptr

This says that the domain that the sender claims to be from must have an a and an mx record that matches the IP address that's sending the email. It also says that the ptr record associated with the sender's IP must resolve to the domain the sender claims to be sending from, as described above (it does help with dealing with infected machines and such).

ip4:5.196.73.75 ip6:2001:41d0:e:74b::1

This bit says that the IP addresses 5.196.73.75 and 2001:41d0:e:74d::1 are explicitly allowed to send mail in the name of starbeamrainbowlabs.com.

a:starbeamrainbowlabs.com a:mail.starbeamrainbowlabs.com

After all of the above, this bit isn't strictly necessary, but it says that all the IP addresses found in the a records for starbeamrainbowlabs.com and mail.starbeamrainbowlabs.com are allowed to send mail in the name of starbeamrainbowlabs.com.

-all

Lastly, this says that if you're not on the list, then your message should be rejected! Other variants on this include ~all (which says "put it in the spam box instead"), and +all (which says "accept it anyway", though I can't see how that's useful :P).

As you can see, SPF allows a mail server to verify if a given client is indeed allowed to send an email in the name of any particular domain name. For a while, this worked a treat - until a new problem arose.

Many of the mail servers on the internet don't (and probably still don't!) support encryption when connecting to and delivering mail, as certificates were expensive and difficult to get hold of (nowadays we've got LetsEncrypt who give out certificates for free!). The encryption used when mail servers connect to one another is practically identical to that used in HTTPS - so if done correctly, the identity of the remote server can be verified and the emails exchanged encrypted, if the world's certification authorities aren't corrupted, of course.

Since most emails weren't encrypted when in transit, a new problem arose: man-in-the-middle attacks, whereby an email is altered by one or more servers in the delivery chain. Thinking about it - this could still happen today even with encryption, if any one server along an email's route is compromised. To this end, another mechanism was desperately needed - one that would allow the receiving mail server to verify that an email's content / headers hadn't been surreptitiously altered since it left the origin mail server - potentially preventing awkward misunderstandings.

Enter stage left: DKIM! DKIM stands for Domain Keys Identified Mail - which, in short, means that it provides a method by which a receiving mail server can cryptographically prove that a message hasn't been altered during transit.

It works by having a public-private keypair, in which the public key can only decrypt things, but the private key is capable of encrypting things. A hash of the email's headers / content is computed and encrypted with the private key. Then the encrypted hash is attached to the email in the DKIM-Signature header.

The receiving mail server does a DNS lookup to find the public key, and decrypts the hash. It then computes it's own hash of the email headers / content, and compares it against the decrypted hash. If it matches, then the email hasn't been fiddled with along the way!

Of course, not all the headers in the email are hashed - only a specific subset are included in the hash, since some headers (like Received and X-Spam-Result) are added and altered during transit. If you're interested in implementing DKIM yourself - DigitalOcean have a smashing tutorial on the subject, which should adapt easily to whatever system you're running yourself.

With both of those in place, billsboosters.com's mail server can now verify that mail.bobsrockets.com is allowed to send the email on behalf of bobsrockets.com, and that the message content hasn't been tampered with since it left mail.bobsrockets.com. mail.billsboosters.com can also catch franksfuel.io in the act of trying to deliver spam from seanssatellites.net!

There is, however, one last piece of the puzzle left to reveal. With all this in place, how do you know if your mail was actually delivered? Is it possible to roll SPF and DKIM out gradually so that you can be sure you've done it correctly? This can be a particular issue for businesses and larger email server setups.

This is where DMARC comes in. It's a standard that lets you specify an email address you'd like to receive DMARC reports at, which contain statistics as to how many messages receiving mail servers got that claimed to be from you, and what they did with them. It also lets you specify what percentage of messages should be subject to DMARC filtering, so you can roll everything out slowly. Finally, it lets you specify what should happen to messages that fail either SPF, DKIM, or both - whether they should be allowed anyway (for testing purposes), quarantined, or rejected.

DMARC policies get specified (yep, you guessed it!) in a DNS record. unlike SPF though, they go in _dmarc.megsmicroprocessors.org as a TXT record, substituting megsmicroprocessors.org for your domain name. Here's an example:

v=DMARC1; p=none; rua=mailto:dmarc@megsmicroprocessors.org

This is just a simple example - you can get much more complex ones than this! Let's go through it step by step.

v=DMARC1;

Nothing to see here - just a version number as in SPF.

p=none;

This is the policy of what should happen to messages that fail. In this example we've used none, so messages that fail will still pass right on through. You can set it to quarantine or even reject as you gain confidence in your setup.

rua=mailto:dmarc@megsmicroprocessors.org

This specifies where you want DMARC reports to be sent. Each mail server that receives mail from your mail server will bundle up statistics and send them once a day to this address. The format is in XML (which won't be particularly easy to read), but there are free DMARC record parsers out there on the internet that you can use to decode the reports, like dmarcian.

That completes the puzzle. If you're still reading, then congratulations! Post in the comments and say hi :D We've climbed the SPF mountain and discovered how email servers validate who is allowed to send mail in the name of another domain. We've visited the DKIM signature fields and seen how the content of email can be checked to see if it's been altered during transit. Lastly, we took a stroll down DMARC lane to see how it's possible to be sure what other servers are doing with your mail, and how a large email server setup can implement DMARC, DKIM, and SPF more easily.

Of course, I'm not perfect - if there's something I've missed or got wrong, please let me know! I'll try to correct it as soon as possible.

Lastly, this is, as always, a starting point - not an ending point. An introduction if you will - it's up to you to research each technology more thoroughly - especially if you're thinking of implementing them yourself. I'll leave my sources at the bottom of this post if you'd like somewhere to start looking :-)

/r/dailyprogrammer hard challenge #322: Static HTTP 1.0 server

Recently I happened to stumble across the dailyprogrammer subreddit's latest challenge. It was for a static HTTP 1.0 server, and while I built something similar for my networking ACW, I thought I'd give this one a go to create an extendable http server that I can use in other projects. If you want to follow along, you can find the challenge here!

My language of choice, as you might have guessed, was C♯ (I know that C♯ has a HttpServer class inbuilt already, but to listen on 0.0.0.0 on Windows it requires administrative privileges).

It ended up going rather well, actually. In a little less than 24 hours after reading the post, I had myself a working solution, and I thought I'd share here how I built it. Let's start with a class diagram:

(Above: A class diagram for the GlidingSquirrel. Is this diagram better than the last one I drew?)

I'm only showing properties on here, as I'll be showing you the methods attached to each class later. It's a pretty simple design, actually - HttpServer deals with all the core HTTP and networking logic, FileHttpServer handles the file system calls (and can be swapped out for your own class), and HttpRequest, HttpResponse, HttpMethod, HttpResponseCode all store the data parsed out from the raw request coming in, and the data we're about to send back out again.

With a general idea as to how it's put together, lets dive into how it actually works. HttpServer would probably be a good place to start:

public abstract class HttpServer
{
public static readonly string Version = "0.1-alpha";

public string BindEndpoint { /* ... */ }

protected TcpListener server;

private Mime mimeLookup = new Mime();
public Dictionary<string, string> MimeTypeOverrides = new Dictionary<string, string>() {
[".html"] = "text/html"
};

{ /* ... */ }
public HttpServer(int inPort) : this(IPAddress.IPv6Any, inPort)
{
}

public async Task Start() { /* ... */ }

public string LookupMimeType(string filePath) { /* ... */ }

protected async void HandleClientThreadRoot(object transferredClient) { /* ... */ }

public async Task HandleClient(TcpClient client) { /* ... */ }

public abstract Task HandleRequest(HttpRequest request, HttpResponse response);
}

(Full version)

It's heavily abbreviated because there's actually quite a bit of code to get through here, but you get the general idea. The start method is the main loop that accepts the TcpClients, and calls HandleClientThreadRoot for each client it accepts. I decided to use the inbuilt ThreadPool class to do the threading for me here:

TcpClient nextClient = await server.AcceptTcpClientAsync();
ThreadPool.QueueUserWorkItem(new WaitCallback(HandleClientThreadRoot), nextClient);

C♯ handles all the thread spawning and killing for me internally this way, which is rather nice. Next, HandleClientThreadRoot sets up a net to catch any errors that are thrown by the next stage (as we're now in a new thread, which can make debugging a nightmare otherwise), and then calls the main HandleClient:

try
{
await HandleClient(client);
}
catch(Exception error)
{
Console.WriteLine(error);
}
finally
{
client.Close();
}

No matter what happens, the client's connection will always get closed. HandleClient is where the magic start to happen. It attaches a StreamReader and a StreamWriter to the client:

StreamReader source = new StreamReader(client.GetStream());
StreamWriter destination = new StreamWriter(client.GetStream()) { AutoFlush = true };

...and calls a static method on HttpRequest to read in and decode the request:

HttpRequest request = await HttpRequest.FromStream(source);
request.ClientAddress = client.Client.RemoteEndPoint as IPEndPoint;

More on that later. With the request decoded, HandleClient hands off the request to the abstract method HandleRequest - but not before setting up a secondary safety net first:

try
{
await HandleRequest(request, response);
}
catch(Exception error)
{
response.ResponseCode = new HttpResponseCode(503, "Server Error Occurred");
await response.SetBody(
$"An error ocurred whilst serving your request to '{request.Url}'. Details:\n\n" +$"{error.ToString()}"
);
}

This secondary safety net means that we can send a meaningful error message back to the requesting client in the case that the abstract request handler throws an exception for some reason. In the future, I'll probably make this customisable - after all, you don't always want to let the client know exactly what crashed inside the server's internals!

The FileHttpServer class that handles the file system logic is quite simple, actually. The magic is in it's implementation of the abstract HandleRequest method that the HttpServer itself exposes:

public override async Task HandleRequest(HttpRequest request, HttpResponse response)
{
if(request.Url.Contains(".."))
{
await response.SetBody("Error the requested path contains dangerous characters.");
return;
}

string filePath = getFilePathFromRequestUrl(request.Url);
if(!File.Exists(filePath))
{
response.ResponseCode = HttpResponseCode.NotFound;
await response.SetBody($"Error: The file path '{request.Url}' could not be found.\n"); return; } FileInfo requestFileStat = null; try { requestFileStat = new FileInfo(filePath); } catch(UnauthorizedAccessException error) { response.ResponseCode = HttpResponseCode.Forbidden; await response.SetBody( "Unfortunately, the server was unable to access the file requested.\n" + "Details:\n\n" + error.ToString() + "\n" ); return; } response.Headers.Add("content-type", LookupMimeType(filePath)); response.Headers.Add("content-length", requestFileStat.Length.ToString()); if(request.Method == HttpMethod.GET) { response.Body = new StreamReader(filePath); } } With all the helper methods and properties on HttpResponse, it's much shorter than it would otherwise be! Let's go through it step by step. if(request.Url.Contains("..")) This first step is a quick check for anything obvious that could be used against the server to break out of the web root. There are probably other dangerous things you can do(or try to do, anyway!) to a web server to attempt to trick it into returning arbitrary files, but I can't think of any of the top of my head that aren't covered further down. If you can, let me know in the comments! string filePath = getFilePathFromRequestUrl(request.Url); Next, we translate the raw path received in the request into a path to a file on disk. Let's take a look inside that method: protected string getFilePathFromRequestUrl(string requestUrl) { return$"{WebRoot}{requestUrl}";
}

It's rather simplistic, I know. I can't help but feel that there's something I missed here.... Let me know if you can think of anything. (If you're interested about the dollar syntax there - it's called an interpolated string, and is new in C♯ 6! Fancy name, I know. Check it out!)

if(!File.Exists(filePath))
{
response.ResponseCode = HttpResponseCode.NotFound;
await response.SetBody($"Error: The file path '{request.Url}' could not be found.\n"); return; } Another obvious check. Can't have the server crashing every time it runs into a 404! A somewhat interesting note here: File.Exists only checks to see if there's a file that exists under the specified path. To check for the existence of a directory, you have to use Directory.Exists - which would make directory listing rather easy to implement. I might actually try that later - with an option to turn it off, of course. FileInfo requestFileStat = null; try { requestFileStat = new FileInfo(filePath); } catch(UnauthorizedAccessException error) { response.ResponseCode = HttpResponseCode.Forbidden; await response.SetBody( "Unfortunately, the server was unable to access the file requested.\n" + "Details:\n\n" + error.ToString() + "\n" ); return; } Ok, on to something that might be a bit more unfamiliar. The FileInfo class can be used to get, unsurprisingly, information about a file. You can get all sorts of statistics about a file or directory with it, such as the last modified time, whether it's read-only from the perspective of the current user, etc. We're only interested in the size of the file though for the next few lines: response.Headers.Add("content-type", LookupMimeType(filePath)); response.Headers.Add("content-length", requestFileStat.Length.ToString()); These headers are important, as you might expect. Browsers to tend to like to know the type of content they are receiving - and especially it's size. if(request.Method == HttpMethod.GET) { response.Body = new StreamReader(filePath); } Lastly, we send the file's contents back to the user in the response - but only if it's a GET request. This rather neatly takes care of HEAD requests - but might cause issues elsewhere. I'll probably end up changing it if it does become an issue. Anyway, now that we've covered everything right up to sending the response back to the client, let's end our tour with a look at the request parsing system. It's a bit backwards, but it does seem to work in an odd sort of way! It all starts in HttpRequest.FromStream. public static async Task<HttpRequest> FromStream(StreamReader source) { HttpRequest request = new HttpRequest(); // Parse the first line string firstLine = await source.ReadLineAsync(); var firstLineData = ParseFirstLine(firstLine); request.HttpVersion = firstLineData.httpVersion; request.Method = firstLineData.requestMethod; request.Url = firstLineData.requestPath; // Extract the headers List<string> rawHeaders = new List<string>(); string nextLine; while((nextLine = source.ReadLine()).Length > 0) rawHeaders.Add(nextLine); request.Headers = ParseHeaders(rawHeaders); // Store the source stream as the request body now that we've extracts the headers request.Body = source; return request; } It looks deceptively simple at first glance. To start with, I read in the first line, extract everything useful from it, and attach them to a new request object. Then, I read in all the headers I can find, parse those too, and attach them to the request object we're building. Finally, I attach the StreamReader to the request itself, as it's now pointing at the body of the request from the user. I haven't actually tested this, as I don't actually use it anywhere just yet, but it's a nice reminder just in case I do end up needing it :-) Now, let's take a look at the cream on the cake - the method that parses the first line of the incoming request. I'm quite pleased with this actually, as it's my first time using a brand new feature of C♯: public static (float httpVersion, HttpMethod requestMethod, string requestPath) ParseFirstLine(string firstLine) { List<string> lineParts = new List<string>(firstLine.Split(' ')); float httpVersion = float.Parse(lineParts.Last().Split('/')[1]); HttpMethod httpMethod = MethodFromString(lineParts.First()); lineParts.RemoveAt(0); lineParts.RemoveAt(lineParts.Count - 1); string requestUrl = lineParts.Aggregate((string one, string two) =>$"{one} {two}");

return (
httpVersion,
httpMethod,
requestUrl
);
}

Monodevelop, my C♯ IDE, appears to go absolutely nuts over this with red squiggly lines everywhere, but it still compiles just fine :D

As I was writing this, a thought popped into my head that a tuple would be perfect here. After reading somewhere a month or two ago about a new tuple syntax that's coming to C♯ I thought I'd get awesomely distracted and take a look before continuing, and what I found was really cool. In C♯ 7 (the latest and quite possibly greatest version of C♯ to come yet!), there's a new feature called value tuples, which let's you dynamically declare tuples like I have above. They're already fully supported by the C♯ compiler, so you can use them today! Just try to ignore your editor if it gets as confused as mine did... :P

If you're interested in learning more about them, I'll leave a few links at the bottom of this post. Anyway, back to the GlidingSquirrel! Other than the new value tuples in the above, there's not much going on, actually. A few linq calls take care of the heavy lifting quite nicely.

And finally, here's my header parsing method.

public static Dictionary<string, string> ParseHeaders(List<string> rawHeaders)
{
Dictionary<string, string> result = new Dictionary<string, string>();

{
KeyValuePair<string, string> nextHeader = new KeyValuePair<string, string>(
parts[0].Trim().ToLower(),
parts[1].Trim()
);
result[nextHeader.Key] = $"{result[nextHeader.Key]},{nextHeader.Value}"; else result[nextHeader.Key] = nextHeader.Value; } return result; } While I have attempted to build in support for multiple definitions of the same header according to the spec, I haven't actually encountered a time when it's actually been needed. Again, this is one of those things I've built in now for later - as I do intend on updating this and adding more features later - and perhaps even work it into another secret project I might post about soon. Lastly, I'll leave you with a link to the repository I'm storing the code for the GlidingSquirrel, and a few links for your enjoyment: GlidingSquirrel Update 2018-05-01: Fixed a few links. Sources and Further Reading How to set up a WebDav share with Nginx I've just been setting up a WebDav share on a raspberry pi 3 for my local network (long story), and since it was a bit of a pain to set up (and I had to combine a bunch of different tutorials out there to make mine work), I thought I'd share how I did it here. I'll assume you have a raspberry pi all set up and up-to-date in headless mode, with a ufw for your firewall (if you need help with this, post in the comments below or check out the Raspberry Pi Stack Exchange). To start with, we need to install the nginx-full package: sudo apt update sudo apt install nginx-full Note that we need the nginx-full package here, because the nginx-extras or just simply nginx packages don't include the required additional webdav support modules. Next, we need to configure Nginx. Nginx's configuration files live at /etc/nginx/nginx.conf, and in /etc/nginx/conf.d. I did something like this for my nginx.conf: user www-data; worker_processes 4; pid /run/nginx.pid; events { worker_connections 768; # multi_accept on; } http { ## # Basic Settings ## sendfile on; tcp_nopush on; tcp_nodelay on; keepalive_timeout 65; types_hash_max_size 2048; # server_tokens off; # server_names_hash_bucket_size 64; # server_name_in_redirect off; include /etc/nginx/mime.types; default_type application/octet-stream; ## # SSL Settings ## ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE ssl_prefer_server_ciphers on; ## # Logging Settings ## access_log /var/log/nginx/access.log; error_log /var/log/nginx/error.log; ## # Gzip Settings ## gzip on; gzip_vary on; gzip_proxied any; gzip_comp_level 6; gzip_buffers 16 8k; gzip_http_version 1.1; gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript; ## # Virtual Host Configs ## include /etc/nginx/conf.d/*.conf; include /etc/nginx/sites-enabled/*; } Not many changes here. Then, I created a file called 0-webdav.conf in the conf.d directory, and this is what I put in it: server { listen 80; listen [::]:80; server_name plans.helenshydrogen.be; auth_basic realm_name; auth_basic_user_file /etc/nginx/.passwords.list; dav_methods PUT DELETE MKCOL COPY MOVE; dav_ext_methods PROPFIND OPTIONS; dav_access user:rw group:rw all:r client_body_temp_path /tmp/nginx/client-bodies; client_max_body_size 0; create_full_put_path on; root /mnt/hydroplans; } Now this is where the magic happens. The dav_access directive tells nginx to allow everyone to read, but only logged in users to write to the share. This isn't actually particularly relevant, because of the auth_basic and auth_basic_user_file directives, which tell nginx to require people to login to the share before they are allowed to access it. It's also important to note that I found that Windows (10, at least), didn't like the basic authentication - even though Ubuntu's Nautilus accepted it just fine - so I had to comment that bit out :-( If you do still want authentication (hey! May you'll have better luck than I :P), then you'll need to set up the passwords file. Here's how you create it:  echo -n 'helen:' | sudo tee /etc/nginx/.passwords.list openssl passwd -apr1 | sudo tee -a /etc/nginx/.passwords.list Password:  The above creates a user called helen, and asks you to type a password. If you're adding another user to the file, simply change the first tee to be tee -a to avoid overwriting the first one. With that all configured, it's time to test the configuration file, and, if we're lucky, restart nginx!  sudo nginx -t sudo systemctl restart nginx  That's all you should need to do to set up a simple WebDav share. Remember that this is a starting point, and not an ending point - there are a few big holes in the above that you'll need to address, depending on your use case (for example, I haven't included the setup of https / encryption - try letsencrypt for that). Here are the connection details for the above for a few different clients: • Ubuntu / Nautilus: (Go to "Other Locations" and paste this into the "Connect to Server" box) dav://plans.helenshydrogen.be/ • Windows: (Go to "Map Network Drive" and paste this in) http://plans.helenshydrogen.be/ Did this work for you? Have any problems? Got instructions for a WebDav client not listed here? Let me know in the comments! The HTTPS version of my website is insecure? Nonsense! I'm still rather ill, but I wanted to post about an issue I've just had with my website. Upon visiting my website in the latest version of chrome beta (57 as of the time of typing), I discovered that chrome had decided that the connection was 'insecure'. It didn't tell me precisely what the problem was (even in the developer tools :-) - why would I possibly need to know that? - only that it considered it insecure. After googling around a bit, I didn't find any specific articles on the subject - their recent move to start considering regular http connections insecure is swamping all the relevant articles in the search results I suspect. The big clue came when I discovered that one of my subdomains that uses a letsencrypt works as expected. You see, the main website actually used a StartSSL certificate. My running theory is that even though my certificate was an SHA2 cerrtificate, chrome decided that it was not trustworthy as there was an SHA1 certificate in the trust chain somewhere. The fix: Replace all my existing StartSSL certificates with Let's Encrypt ones. It seems to have fixed the issue for now. I also discovered that Let's Encrypt certificates can also be used in mail servers (i.e. SMTP and IMAP) too - so I don't have to go and fiddle about with finding an alternative certificate provider. In future, it would certainly be helpful if Google actually told people precisely what they were going to do before they do it....! Was this useful? Could it be improved? Would you like a Let's encrypt tutorial? Let me know in the comments below! Fancy message of the day over SSH Since my time to sit down for a good chunk of time and write some code has been extremely limited as of late, I've been playing around with a few smaller projects. One of those is a fancy message of the day when you log into a remote machine (in my case the server this website is hosted on!), and I thought I'd share it here. The default message shown at the top when you login via ssh is actually generated by something called update-motd, and is generated from a set of scripts in /etc/update-motd.d. By customising these scripts, we can do almost anything we like! To start off with, I disabled the execution of all the scripts in the directory (sudo chmod -x /etc/update-motd.d/*), and created a subfolder to store the script in that actually generated the system information (sudo mkdir /etc/update-motd.d/parts). Here's the script I wrote to generate the system information: #!/usr/bin/env bash . /etc/lsb-release LOAD=$(cat /proc/loadavg | cut -d' ' -f 2);

CPU_COUNT=$(cat /proc/cpuinfo | grep -i "core id" | uniq | wc -l); THREAD_COUNT=$(cat /proc/cpuinfo | grep -i "core id" | wc -l);

APT_UPDATE_DETAILS="$(/usr/lib/update-notifier/apt-check --human-readable | fold -w 40 -s)" IPV4_ADDRESS=$(dig +short myip.opendns.com A @resolver1.opendns.com)
IPV6_ADDRESS=$(dig +short myip.opendns.com AAAA @2620:0:ccc::2); LAST_LOGIN=$(last -1 | head -n 1 | awk '{ print $1,"at",$4,$5,$6,$7,"from",$3 }');

REBOOT_REQUIRED=$(/usr/lib/update-notifier/update-motd-reboot-required); echo echo Welcome to$(hostname)
echo "  running ${DISTRIB_DESCRIPTION}" echo echo Kernel:$(uname -r)
echo Uptime: $(uptime --pretty | sed -e 's/up //') echo Load:${LOAD}
echo
echo IPs: ${IPV4_ADDRESS},${IPV6_ADDRESS}
echo
echo "${APT_UPDATE_DETAILS}" echo echo "${REBOOT_REQUIRED}"
#echo
#echo Last login: ${LAST_LOGIN} exit 0 Basically, I collect a bunch of information from random places on my system (several of which were taken from the existing scripts in /etc/update-motd.d/) and re-output them in a different format. Then, I converted an image of my favicon logo with the brilliant catimg by posva to a set of unicode characters and sent that to a file (catimg -w 35 image.png >/etc/update-motd.d/sbrl-logo.txt) - you could alternatively use some ascii art from the internet (e.g. this site). Once done, I put the two together with the following script directly in my /etc/update-motd.d/ folder: #!/usr/bin/env bash ### Settings ### TMP_FILENAME=/run/sysinfo.txt #/etc/update-motd.d/parts/sysinfo ################ /etc/update-motd.d/parts/sysinfo >$TMP_FILENAME

### Output ###
echo
pr -mtJ /etc/update-motd.d/sbrl-logo.txt $TMP_FILENAME ############## ### Cleanup ### rm$TMP_FILENAME

###############

Finally, I manually cleared and regenerated the message of of the day with sudo update-motd, giving the result you see at the top of this blog post. I also made sure to re-enable the execution of the other scripts I didn't use in my fancy motd so as to not miss out on their notifications.

If you're interested, I've generated an archive of my final /etc/update-motd.d folder (minus my logo in text format), which you can find here: 20170203-Fancy-Motd.7z.

Can you do better? Got a cool enhancement of your own? Post about it below!

I now have a public website status page!

Just recently Uptime Robot (the awesome service that I use to monitor my server's uptime) have released a new feature: Public status pages! Status pages appear to be free (for now), so I've gone and set one up. Now all of you can see what's up with my website if it's down.

They even allow you to point a (sub)domain at it too. I did this too, so you can visit my status page at status.starbeamrainbowlabs.com.

Art by Mythdael