Starbeamrainbowlabs

Stardust
Blog


Archive


Mailing List Articles Atom Feed Comments Atom Feed Twitter Reddit Facebook

Tag Cloud

3d account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio bash batch blog bookmarklet booting c sharp c++ challenge chrome os code codepen coding conundrums coding conundrums evolved command line compilers compiling compression css dailyprogrammer debugging demystification distributed computing documentation downtime electronics email embedded systems encryption es6 features event experiment external first impressions future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro network networking nibriboard node.js operating systems performance photos php pixelbot portable privacy problem solving programming problems projects prolog protocol protocols pseudo 3d python reddit redis reference release releases resource review rust searching secrets security series list server software sorting source code control statistics storage svg technical terminal textures three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

Geolocation Strategies

Surprisingly often you'll find yourself needing to know the physical location of a user - be it to calculate which time-zone they're in, when the sun sets, or even which building they're in for a guided tour. Such a question has multiple different answers and ways of approaching it, so I decided to blog about the ones I can think of for future reference.

Location-By-IP-Address

The first method that comes to mind is doing a lookup based on the user's IP address. This gives city / area / country level accuracy, which should be good enough for sunrise and sunset times, time-zone information, and so on. Several websites exist that provide this as a service - ipinfo.io comes to mind. You can even do this in the terminal:

curl https://ipinfo.io/ | jq --raw-output '.loc'

(You'll need jq installed to use this. Most Linux distributions come with it in their repositories!)

If a web service doesn't suit you, then several downloadable databases exist. Most you have to pay for to get the 'full' (and most accurate) version, but the free versions available seem to be pretty good up to country / time-zone level. Here are a few I've found:

Precision Matters

If you need something more precise (say for a driving-tracking app or something), then GPS is usually the way to go, although phones are the only device that commonly carry a sensor. Despite the name, GPS actually refers to just a single satellite constellation that was launched by the USA.

Others exist too, of course - Russia have GLONASS, China has BeiDou, and the EU now has Galileo just to name a few. Most phones will support at least 2 from this list out-of-the-box. There's a great app available for Android if you'd like to play around and explore your phone's GPS capabilities.

Utilising a device's GPS support is usually just a case of finding the appropriate API (and, in some cases, permissions) for your environment. For the web, there's the Geolocation API (note that it's for HTTPS websites only). On Android, there's android.location apparently.

Bring in the big data

GPS can be slow at getting a fix, and drain your users' battery. It also has issues in indoor environments. Thankfully, there's an alternative: WiFi-based location mapping. Though it only works where there are WiFi hotspots about (urban areas are best), it is much faster and doesn't use nearly as much power.

Pioneered by Google with their Google Play Services Location API, it sends a list of the strongest WiFi access points (their MAC addresses & signal strengths) that the device can detect off to a remote server, and, after consulting it's massive database of WiFi access points and their locations, it responds with a triangulation of the user's approximate location.

Of course, you probably don't want to be giving Google your location on a regular basis just to save battery power. To this end, Mozilla (the company behind Firefox!) has built their own crowd-sourced location service. Creatively named the Mozilla Location Service, it's continually improved by users of Firefox on their Phones who submit their location along with the detected WiFi networks for analysis on a semi-regular (and automated) basis. There's even a map showing the current coverage!

If WiFi isn't for you, then don't despair! The mobile network (1/2/3/4G) cell towers can also be used - if you've got the right kind of device (sorry, desktops). Databases of cell tower locations exist that can be used to triangulate a user's approximate position.

If you've got a limited indoor area that's already got WiFi that you need to cover, it shouldn't be too time-consuming to manually construct your own database instead.

For example, in a tour-guide Android app you could record the signal strengths in the pre-set locations that you want to take people in your tour, and use a Neural Network (libraries exist!) to decide which one the user is closest to by comparing the signal strengths & MAC addresses of the top 3 WiFi access points in the area.

Bluetooth to the rescue!

If that's still not good enough (say if there aren't any WiFi networks around), you'll probably have to start deploying some of your own infrastructure. Bluetooth beacons are a good place to start. In short, a bluetooth beacon is simply a Bluetooth device that advertises it's presence on a regular basis, along with both it's actual position and the transmitting power that it is currently using to advertise itself. By triangulating the signal from 2 or more beacons, the user's position can be determined.

Of course, this requires that you have a bunch of Bluetooth beacons to hand, which aren't exactly cheap! You could, of course, build your own with an Arduino, but there are still the parts to buy there.

Other worthy contenders

That just about covers all the main geolocation methods I can think of, though there are few others I've come across that sound interesting - but aren't particularly useful unless you've got some specific circumstances.

The first of these is eLoran. Similar in structure to GPS, eLoran has ground-based transmitter stations instead of space-based satellites. It's mainly used by ships at sea and aeroplanes, but I'm sure that it's used elsewhere too. Of course, you need a special receiver chip (and aerial, I should imagine) to use it.

Secondly, there's some research ongoing into utilising LoRa gateways to geolocate a device. I've blogged about LoRa before (and might do again now that I'm actually getting really close to having a pair for RFM95 chips working!) here and also here. There's a blog post (not written by me) explaining geolocation via LoRa here, but it's basically the same kind of thing as above with WiFi, GPS, eLoran, and mobile cell towers.

Conclusion

We've covered practically every geolocation method I can think of - from coarse IP Address-based solutions to much finer GPS and WiFi-based technologies. There are certainly more of them than I thought - each with its own benefits and drawbacks, making different technologies useful in different situations.

Found this interesting? Got another method I haven't thought of? Comment below!

Goodbye, Flash?

You're probably already aware that Adobe announced that they are ending support for flash at the end of 2020. You're browser has probably already disabled the plugin by default. This is good news for the web as a whole, but what about everything left behind?

What if I said that at the end of 2020 we'll lose an entire chapter of the Internet's history? I'm talking about the masses of content on the internet that's powered by flash. While the Internet Archive probably has a considerable portion of ti archived already, that's only the beginning of the issues at hand.

For example, much of the flash content of yore had what's called a site lock, which prevents it from working on any site other than specifically-approved domains. This was very useful in prevent thieves from taking off with valuable content, setting up a competing website, and earning tons of ad revenue from it, but it's a hindrance to archiving efforts.

The other problem at hand is that if Adobe end support for flash, it isn't going to stick around in our browsers for long after that - so we'll lose the ability to run all this flash content that makes up such a huge part of the Internet's history.

A scary thought, to be sure! Thankfully, all is not lost! Dedicated volunteers have set up a project called Flashpoint, which serves to not only archive all this content, but build a series fo programs that will allow people for generations to come to run said content - through the use of clever hacks such as an intelligent local proxy, and tools such as the standalone projector version of flash, Pale Moon, and other such software.

Flashpoint's logo. I don't own it!

If you're interested in checking it out, I suggest heading over to Flashpoint's website and downloading the Infinity version - this downloads the content you wish to run dynamically - instead of all at once (totalling ~34GiB O.o). There's even a master list (also here) listing everything archived so far.

C# & .NET Terminology Demystified: A Glossary

After my last glossary post on LoRa, I thought I'd write another one of C♯ and .NET, as (in typical Microsoft fashion it would seem), they're seems to be a lot of jargon floating around whose meaning is not always obvious.

If you're new to C♯ and the .NET ecosystems, I wouldn't recommend tackling all of this at once - especially the bottom ~3 definitions - with those in particular there's a lot to get your head around.

C♯

C♯ is an object-oriented programming language that was invented by Microsoft. It's cross-platform, and is usually written in an IDE (Integrated Development Environment), which has a deeper understanding of the code you write than a regular text editor. IDEs include Visual Studio (for Windows) and MonoDevelop (for everyone else).

Solution

A Solution (sometimes referred to as a Visual Studio Solution) is the top-level definition of a project, contained in a file ending in .sln. Each solution may contain one or more Project Files (not to be confused with the project you're working on itself), each of which gets compiled into a single binary. Each project may have its own dependencies too: whether they be a core standard library, another project, or a NuGet package.

Project

A project contains your code, and sits 1 level down from a solution file. Normally, a solution file will sit in the root directory of your repository, and the projects will each have their own sub-folders.

While each project has a single output file (be that a .dll class library or a standalone .exe executable), a project may have multiple dependencies - leading to many files in the build output folder.

The build process and dependency definitions for a project are defined in the .csproj file. This file is written in XML, and can be edited to perform advanced build steps, should you need to do something that the GUI of your IDE doesn't support. I've blogged about the structuring of this file before (see here, and also a bit more here), should you find yourself curious.

CIL

Known as Common Intermediate Language, CIL is the binary format that C♯ (also Visual Basic and F♯ code) code gets compiled into. From here, the .NET runtime (on Windows) or Mono (on macOS, Linux, etc.) can execute it to run the compiled project.

MSBuild

The build system for Solutions and Projects. It reads a .sln or .csproj (there are others for different languages, but I won't list them here) file and executes the defined build instructions.

.NET Framework

The .NET Framework is the standard library of C♯ it provides practically everything you'll need to perform most common tasks. It does not provide a framework for constructing GUIs and Graphical Interfaces. You can browse the API reference over at the official .NET API Browser.

WPF

The Windows Presentation Foundation is a Windows-only GUI framework. Powered by XAML (eXtensible Application Markup Language) definitions of what the GUI should look like, it provides everything you need to create a native-looking GUI on Windows.

It does not work on macOS and Linux. To create a cross-platform program that works on all 3 operating systems, you'll need to use an alternative GUI framework, such as XWT or Gtk# (also: Glade). A more complete list of cross-platform frameworks can be found here. It's worth noting that Windows Forms, although a tempting option, aren't as flexible as the other options listed here.

C♯ 7

The 7th version of the C♯ language specification. This includes the syntax of the language, but not the .NET Framework itself.

.NET Standard

A specification of the .NET Framework, but not the C♯ Language. As of the time of typing, the latest version is 2.0, although version 1.6 is commonly used too. The intention here is the improve cross-platform portability of .NET programs by defining a specification for a subset of the full .NET Framework standard library that all platforms will always be able to use. This includes Android and iOS through the use of Xamarin.

Note that all .NET Standard projects are class libraries. In order to create an executable, you'll have to add an additional Project to your Solution that references your .NET Standard class library.

ASP.NET

A web framework for .NET-based programming languages (in our case C♯). Allows you to write C♯ code to handle HTTP (and now WebSockets) requests in a similar manner to PHP, but different in that your code still needs compiling. Compiled code is then managed by a web server IIS web server (on Windows).

With the release of .NET Core, ASP.NET is now obsolete.

.NET Core

Coming in 2 versions so far (1.0 and 2.0), .NET Core is the replacement for ASP.NET (though this is not its exclusive purpose). As far as I understand it, .NET Core is a modular runtime that allows programs targeting it to run multiple platforms. Such programs can either be ASP.NET Core, or a Universal Windows Platform application for the Windows Store.

This question and answer appears to have the best explanation I've found so far. In particular, the displayed diagram is very helpful:

A diagram showing the structure of the .NET ecosystem, including .NET Core. See the links in the sources and further reading below for more information.

....along with the pair of official "Introducing" blog posts that I've included in the Sources and Further Reading section below.

Conclusion

We've looked at some of the confusing terminology in the .NET ecosystems, and examined each of them in turn. We started by defining and untangling the process by which your C♯ code is compiled and run, and then moved on to the different variants and specifications related to the .NET Framework and C♯.

As always, this is a starting point - not an ending point! I'd recommend doing some additional reading and experimentation to figure out all the details.

Found this helpful? Still confused? Spotted a mistake? Comment below!

Sources and Further Reading

Generating word searches for fun and profit

(Above: A Word search generated with the tool below)

A little while ago I was asked about generating a wordsearch in a custom shape. I thought to myself "someone has to have built this before...", and while I was right to an extent, I couldn't find one that let you use any shape you liked.

This, of course, was unacceptable! You've probably guessed it by now, but I went ahead and wrote my own :P

While I wrote it a little while ago, I apparently never got around to posting about it on here.

In short, it works by using an image you drop into the designated area on the page as the shape the word search should take. Each pixel is a single cell of the word search - with the alpha channel representing whether or not a character is allowed to be placed there (transparent means that it can't contain a character, and opaque means that it can).

Creating such an image is simple. Personally, I recommend Piskel or GIMP for this purpose.

Once done, you can start building a wordlist in the wordlist box at the right-hand-side. It should rebuild the word search as soon as you click out of the box. If it doesn't, then you've found a bug! Please report it here.

With the word search generated, you can use the Question Sheet and Answer Sheet links to open printable versions for export.

You can find my word search generator here:

I've generated a word search of the current tags in the tag cloud on this blog too: Question Sheet [50.3KiB], Answer Sheet [285.6KiB]

The most complicated part of this was probably the logistics behind rude word removal. Thankfully, I did't have to find and maintain such a list of words, as the futility npm package does this for me, but algorithmically guaranteeing that by censoring 1 rude word another is not accidentally created in another direction is a nasty problem.

If you're interested in a more technical breakdown of one (or several!) particular aspects of this - let me know! While writing about all of it would probably make for an awfully long post, a specific aspect or two should be more manageable.

In the future, I'll probably revisit this and add additional features to it, such as the ability to restrict which directions words are placed in, for example. If you've got a suggestion of your own, open an issue (or even better, open a pull request :D)!

Converting my timetable to ical with Node.JS and Nightmare

A photo of a nice beach in a small bay, taken from a hill off to the side. A small-leaved tree in the foreground frames the bottom-left, with white-crested waves breaking over the beach in the background, before riding steeply on the way inland.

(Source: Taken by me!)

My University timetable is a nightmare. I either have to use a terrible custom app for my phone, or an awkwardly-built website that feels like it's at least 10 years old!

Thankfully, it's not all doom and gloom. For a number of years now, I've been maintaining a Node.JS-based converter script that automatically pulls said timetable down from the JSON backend of the app - thanks to a friend who reverse-engineered said app. It then exports it as a .ical file that I can upload to my server & subscribe to in my pre-existing calendar.

Unfortunately, said backend changed quite dramatically recently, and broke my script. With the only alternative being the annoying timetable website that really don't like being scraped.

Where there's a will, there's a way though. Not to be deterred, I gave it a nightmare of my own: a scraper written with Nightmare.JS - a Node.JS library that acts, essentially, as a scriptable web-browser!

While the library has some kinks (especially with .wait("selector")), it worked well enough for me to implement a scraper that pulled down my timetable in HTML form, which I then proceeded to parse with cheerio.

The code is open-source (find it here!) - and as of this week I've updated it to work with the new update to the timetabling system this semester. A further update will be needed in early December time, which I'll also be pushing to the repository.

The README of the repository should contain adequate instructions for getting it running yourself, but if not, please open an issue!

Note that I am not responsible for anything that happens as a result of using this script! I would strongly recommend setting up the secure storage of your password if you intend to automate it. I've just written this to solve a problem in order to ensure that I can actually get to my lectures on time - and not an hour late or on the wrong week because I've misread the timetable (again)!

In the future, I'd like to experiment with other scriptable web-browser frameworks to compare them with my experiences with NightmareJS.

Found this interesting? Found a better way to do this? Comment below!

LoRa Terminology Demystified: A Glossary

My 2 RFM95s on the lid of my project's box. More info in a future blog post coming soon!

(Above: My 2 RFM95s. One works, but the other doesn't yet....)

I've been doing some more experimenting with LoRa recently, as I've got 1 of my 2 RFM95 working (yay)! While the other is still giving me trouble (meaning that I can't have 1 transmit and the other receive yet :-/), I've still been able to experiment with other people's implementations.

To that end, I've been learning about a bunch of different words and concepts - and thought that I'd document them all here.

LoRa

The radio protocol itself is called LoRa, which stands for Long Range. It provides a chirp-based system (more on that later under Bandwidth) to allow 2 devices to communicate over great distances.

LoRaWAN

LoRaWAN builds on LoRa to provide a complete end-to-end protocol stack to allow Internet of Things (IoT) devices to communicate with an application server and each other. It provides:

  • Standard device classes (A, B, and C) with defined behaviours
    • Class A devices can only receive for a short time after transmitting
    • Class B devices receive on a regular, timed, basis - regardless of when they transmit
    • Class C devices send and receive whenever they like
  • The concept of a Gateway for picking up packets and forwarding them across the rest of the network (The Things Network is the largest open implementation to date - you should definitely check it out if you're thinking of using LoRa in a project)
  • Secure multiple-layered encryption of messages via AES

...amongst many other things.

The Things Network

The largest open implementation of LoRaWAN that I know of. If you hook into The Things Network's LoRaWAN network, then your messages will get delivered to and from your application server and LoRaWAN-enabled IoT device, wherever you are in the world (so long as you've got a connection to a gateway). It's often abbreviated to TTN.

Check out their website.

A coverage map for The Things Network.

(Above: A coverage map for The Things Network. The original can be found here)

Data Rate

The data rate is the speed at which a message is transmitted. This is measured in bits-per-second, as LoRa itself is an 'unreliable' protocol (it doesn't guarantee that anyone will pick anything up at the other end). There are a number of preset data rates:

Code Speed (bits/second)
DR0 250
DR1 440
DR2 980
DR3 1760
DR4 3125
DR5 5470
DR6 11000
DR7 50000

_(Source: Exploratory Engineering: Data Rate and Spreading Factor)_

These values are a little different in different places - the above are for Europe on 868MHz.

Maximum Payload Size

Going hand-in-hand with the Data Rate, the Maximum Payload Size is the maximum number of bytes that can be transmitted in a single packet. If more than the maximum number of bytes needs to be transmitted, then it will be split across multiple packets - much like TCP's Maximum Transmission Unit (MTU), when it comes to that.

With LoRa, the maximum payload size varies with the Data Rate - from 230 bytes at DR7 to just 59 at DF2 and below.

Spreading Factor

Often abbreviated to just simply SF, the spreading factor is also related to the Data Rate. In LoRa, the Spreading Factor refers to the duration of a single chirp. There are 6 defined Spreading Factors: ranging from SF7 (the fastest transmission speed) to SF12 (the slowest transmission speed).

Which one you use is up to you - and may be automatically determined by the driver library you use (it's always best to check). At first glance, it may seem optimal to choose SF7, but it's worth noting that the slower speeds achieved by the higher spreading factors can net you a longer range.

Data Rate Configuration bits / second Max payload size (bytes)
DR0 SF12/125kHz 250 59
DR1 SF11/125kHz 440 59
DR2 SF10/125kHz 980 59
DR3 SF9/125kHz 1 760 123
DR4 SF8/125kHz 3 125 230
DR5 SF7/125kHz 5 470 230
DR6 SF7/250kHz 11 000 230
DR7 FSK: 50kpbs 50 000 230

_(Again, from Exploratory Engineering: Data Rate and Spreading Factor)_

Duty Cycle

A Duty Cycle is the amount of time something is active as a percentage of a total time. In the case of LoRa(/WAN?), there is an imposed 1% Duty Cycle, which means that you aren't allowed to be transmitting for more than 1% of the time.

Bandwidth

Often understood, the Bandwidth is the range of frequencies across which LoRa transmits. The LoRa protocol itself uses a system of 'chirps', which are spread form one end of the Bandwidth to the other going either up (an up-chirp), or down (a down-chirp). LoRahas 2 bandwidths it uses: 125kHz, 250kHz, and 500kHz.

Some example LoRa chirps as described above.

(Some example LoRa Chirps. Source: This Article on Link Labs)

Frequency

Frequency is something that most of us are familiar with. Different wireless protocols utilise different frequencies - allowing them to go about their business in peace without interfering with each other. For example, 2.4GHz and 5GHz are used by WiFi, and 800MHz is one of the frequencies used by 4G.

In the case of LoRa, different frequencies are in use in different parts of the world. ~868MHz is used in Europe (443MHz can also be used, but I haven't heard of many people doing so), 915MHz is used in the US, and ~780MHz is used in China.

Location Frequency
Europe 863 - 870MHz
US 902 - 928MHz
China 779 - 787MHz

(Source: RF Wireless World)

Found this helpful? Still confused? Found a mistake? Comment below!

Sources and Further Reading

https://electronics.stackexchange.com/a/305287/180059

Help! My SQLite database is malformed!

Recently I came across a rather worrying SQLite database error:

Error: database disk image is malformed

Hrm, that's odd. Upon double-checking, it looked like the database was functioning (mostly) fine. The above error popping up randomly was annoying though, so I resolved to do something about it. Firstly, I double-checked that said database was actually 'corrupt':

sudo sqlite3 path/to/database.sqlite 'PRAGMA integrity_check';

This outputted something like this:

*** in database main ***
Main freelist: 1 of 8 pages missing from overflow list starting at 36376
Page 23119: btreeInitPage() returns error code 11
On tree page 27327 cell 30: 2nd reference to page 27252

Uh oh. Time to do something about it then! Looking it up online, it turns out that the 'best' solution out there is to export to an .sql file and then reimport again into a fresh database. That's actually quite easy to do. Firstly, let's export the existing database to an .sql file. This is done via the following SQL commands (use sqlite3 path/to/database.db to bring up a shell)

.mode insert
.output /tmp/database_dump.sql
.dump
.exit

With the database exported, we can now re-import it into a fresh database. Bring up a new SQLite3 shell with sqlite3, and do the following:

.save /tmp/new_database.sqlite
.read /tmp/database_dump.sql
.exit

...that might take a while. Once done, swap our old corrupt database out for your shiny new one and you're done! I would recommend keeping the old one as a backup for a while just in case though (perhaps bzip2 path/to/old_database.sqlite?).

Also, if the database is on an embedded system, you may find that downloading it to your local computer for the repair process will make it considerably faster.

Found this useful? Still having issues? Comment below!

Sources

Proxies: What's the difference?

You've probably heard of proxies. Perhaps you used one when you were at school to access a website you weren't supposed to. But did you know that there are multiple different types of proxies that are used for different things? For example, a reverse proxy perform load-balancing and caching for your web application? And that a transparent proxy can be used to filter the traffic of your internet connection without you knowing (well, almost)? In this post, I'll be explaining the difference between the different types of proxy I'm aware of, why you'd want one, and how to detect their presence.

Reverse Proxies

A reverse proxy is one that, when it receives a request, repeats it to an upstream server. For example, I use nginx to reverse-proxy PHP requests to a backend PHP-FPM instance.

A diagram showing how a reverse proxy works. Basically: Client -> nginx (the reverse proxy) -> PHP-FPM (the server behind the reverse proxy).

Reverse proxies also come in really handy if you want to run multiple, perhaps unrelated, servers on a single machine with a single IP address, as they can reverse proxy requests to the right place based on the requested subdomain. For example, on my server I not only serve my website (which in and of itself reverse-proxies PHP requests), but also serves my git server - which is a separate process listening on a different port behind my firewall.

Caching is another key feature of reverse proxies that comes in dead useful if you're running a medium-high traffic website. Instead of forwarding every single request to your backend for processing, if you've got a blog, for instance, you could cache the responses to requests for the posts themselves and serve them directly from the reverse proxy, leaving the slower backend free to process comments that people make, for example. Both nginx and Varnish have support for this. This with method, it's possible to serve 1000s of requests a minute from a very modestly sized virtual machine (say, 512MB RAM, 1 CPU) if configured correctly. Take that, Apache!

Finally, when 1 server isn't enough any more, your can get reverse proxies like nginx to act as a load balancer. In this scenario, there are multiple backend servers (probably running on different machines, with a fast internal LAN connecting them all), and a single front-facing load balancer sitting in front of them all distributing requests to the backend servers. nginx in particular can get very fancy with the logic here, should you need that kind of control. It can even monitor the health of the backend application servers, and avoid sending any requests to unresponsive servers - giving them time to recover from a crash.

A diagram visualing the load-balancer explained above. A single nginx instance faces the internet, with multiple app servers behind it that it proxies requests to.

Forward Proxies

Forward proxies are distinctly different to reverse proxies, in that they make requests to the destination client wants to connect to on their behalf. Such a proxy can be instituted for many reasons. Sometimes, it's for security reasons - for example to ensure that all those connecting to a backend local network are authenticated (authentication with a forward proxy is done via a set of special Proxy- HTTP headers). Other times, it's to preserve data on limited and/or expensive internet connections.

More often though, it's to censor and surveil the internet connection of the users on a network - and also to bypass such censoring. It is in this manner that HTTP(S) has become so pervasive - in that companies, institutions, (and, in rare cases), Internet Service Providers install forward proxies to censor the connections of their users - as such proxies usually only understand HTTP and HTTPS (clients request that a forward proxy retrieve something for them via a GET https://bobsrockets.net/ HTTP/1.1 request for example). If you're curious though, some forward proxies these days support the CONNECT HTTP method, allowing one to set up a TLS connection with another server (whether that be an HTTPS, SSH, SMTPS, or other protocol server). In addition, the SOCKS protocol now allows for arbitrary TCP connection to be proxied through as well.

Forward proxies nearly always require some client-side configuration. If you've wondered what the proxy settings are in your operating system and web browser's settings - this is what they're for.

Such can usually by identified by the Via and other headers that they attach to outgoing requests, as per RFC 2616. Online tools exist that exploit this - allowing you to detect whether such a proxy exists.

Transparent Proxies

Transparent proxies are similar to forward proxies, but do not require any client-side configuration. Instead, they utilise clever networking tricks to intercept network traffic being sent to and from the clients on a network. In this manner, they can cache responses, filter content, and protect the users from attacks without the client necessarily being aware of their existence.

It is important to note here though that utilising a proxy is by no means a substitute for maintaining proper defences on your own computer, such as installing and configuring a firewall, ensuring your system has all the latest updates, and, if you're running windows, ensuring you have an antivirus program installing and running (Windows 10 comes with one automatically these days).

Even though they don't usually attach the Via header (as they are supposed to), such proxies can usually be detected by cleverly designed tests that exploit their tendency to cache requests, thankfully.

Conclusion

So there you have it. We've taken a look at Forward proxies, and the benefits (and drawbacks) they can provide to users. We've also investigated Transparent proxies, and how to detect them. Finally, we've looked at Reverse proxies and the advantages they can provide to enable you to scale and structure your next great web (and other protocol! Nginx supports all sorts of other protocols besides HTTP(S)) application.

Maintenance: Server Push Support!

Recently, I took the time to add the official nginx ppa to my server to keep nginx up-to-date. In doing do, I jumped from a security-path-backported nginx 1.10 to version 1.14..... which adds a bunch of very cool new features. As soon as I leant that HTTP/2 Server Push was among the new features to be supported, I knew that I had to try it out.

In short, Server Push is a new technology - part of HTTP/2.0 (it's here at last :D) - that allows you to send resources to the client before they even know they need them. This is done by enabling it in the web server, and then having the web application append a specially-formatted link header to outgoing requests - which tell the web server what resources it bundle along with the response.

First, let's enable it in nginx. This is really quite simple:

http {
    # ....

    http2_push_preload      on;

    # ....
}

This enables link header parsing serve-wide. If you want to enable it for just a single virtual host, the http2_push_preload directive can be placed inside server blocks too.

With support enabled in nginx, we can add support to our web application (in my case, this website!). If you do a HEAD request against a page on my website, you'll get a response looking like this:

HTTP/2 200 
server: nginx/1.14.0
date: Tue, 21 Aug 2018 12:35:02 GMT
content-type: text/html; charset=UTF-8
vary: Accept-Encoding
x-powered-by: PHP/7.2.9-1+ubuntu16.04.1+deb.sury.org+1
link: </theme/core.min.css>; rel=preload; as=style, </theme/main.min.css>; rel=preload; as=style, </theme/comments.min.css>; rel=preload; as=style, </theme/bit.min.css>; rel=preload; as=style, </libraries/prism.min.css>; rel=preload; as=style, </theme/tagcloud.min.css>; rel=preload; as=style, </theme/openiconic/open-iconic.min.css>; rel=preload; as=style, </javascript/bit.min.js>; rel=preload; as=script, </javascript/accessibility.min.js>; rel=preload; as=script, </javascript/prism.min.js>; rel=preload; as=script, </javascript/smoothscroll.min.js>; rel=preload; as=script, </javascript/SnoozeSquad.min.js>; rel=preload; as=script
strict-transport-security: max-age=31536000;
x-xss-protection: 1; mode=block
x-frame-options: sameorigin

Particularly of note here is the link header. it looks long and complicated, but that's just because I'm pushing multiple resources down. Let's pull it apart. In essence, the link header takes a comma (,) separated list of paths to resources that the web-server should push to the client, along with the type of each. For example, if https://bobsrockets.com/ wanted to push down the CSS stylesheet /theme/boosters.css, they would include a link header like this:

link: </theme/boosters.css>; rel=preload; as=style

It's also important to note here that pushing a resource doesn't mean that we don't have to utilise it somewhere in the page. By this I mean that pushing a stylesheet down as above still means that we need to add the appropriate <link /> element to put it to use:

<link rel="stylesheet" href="/theme/boosters.css" />

Scripts can be sent down too. Doing so is very similar:

link: </js/liftoff.js>; rel=preload; as=script

There are other as values as well. You can send all kinds of things:

  • script - Javascript files
  • style - CSS Stylesheets
  • image - Images
  • font - Fonts
  • document - <iframe /> content
  • audio - Sound files to be played via the HTML5 <audio /> element
  • worker - Web workers
  • video - Videos to be played via the HTML5 <video /> element.

The full list can be found here. If you don't have support in your web server yet (or can't modify HTTP headers) for whatever reason, never fear! There's still something you can do. HTML also supports a similar <link rel="preload" href="...." /> element that you can add to your document's <head>.

While this won't cause your server to bundle extra resources with a response, it'll still tell the client to go off and fetch the specified resources in the background with a high priority. Obviously, this won't help much with external stylesheets and scripts (since simply being present in the document is enough to get the client to request them), but it could still be useful if you're lazily loading images, for example.

In future projects, I'll certainly be looking out for opportunities to take advantage of HTTP/2.0 Server Push (probably starting with investigating options for Pepperminty Wiki). I've found the difference to be pretty extraordinary myself.

Of course, this is hardly the only feature that HTTP/2 brings. If there's the demand, I may blog about other features and how they work too.

Found this interesting? Confused about something? Using this yourself in a cool project? Comment below!

How to set up a shared PDF printer on your local network

I've recently ended up setting up a PDF printer on my local network in an effort to transfer some pictures out of a ridiculous i-device (I tell you, Apple'e iOS is the worst for being a walled garden). Since the process for doing so wasn't entirely obvious, I'm documenting it in this blog post to remind myself for later. If you find it useful, please let me know in the comments below!

Firstly, you'll need a machine running Linux. Any distribution will do, but I'll be using an apt-based distribution, so you may need to alter some of the commands here to suit your system.

Firstly, we need install the cups (which stands for the Common Unix Printing Service) PDF printer driver. It comes with a lot of junk if you're not careful, so here I use --no-install-recommends to avoid installing any unnecessary packages.

sudo apt install printer-driver-cups-pdf --no-install-recommends

If you've got a firewall running (which you really should - see this post of mine for more information on that), then you'll need to open the port 631 for TCP traffic to allow people to print. If you're using ufw, then this should do the trick:

sudo ufw allow cups

If not, then you may need to specify the port number explicitly:

sudo ufw allow 631/tcp

With the printer installed, we next need to open it to the world. Before that though, we should make some changes to the configuration file, which is located at /etc/cups-pdf.conf. Firstly, I wanted to put the resulting PDFs into my file server's shared folder. This is achieved by editing the Out and AnonDirName settings. They should already be present in the configuration file - it's just a matter of changing their values:

Out         /absolute/path/to/output/dir
AnonDirName /absolute/path/to/output/dir

I also wanted to customise the user account and permissions that it saves the pdfs with. I did this through the AnonUser and AnonUMask settings - which should also be present by default:

AnonUser    username
AnonUMask   0007

The umask is basically an inverted permission octal. I found a good calculator calculator online to do it for me :P (Don't forget the preceding 0 - it's important!)

Finally, I experienced an issue whereby cups kept overwriting the same file again and again because the iPad wasn't smart enough to send the photos to print with their actual filenames - instead opting to send them all as Photo.pdf. Thankfully though, cups-pdf has the Label option (also specified by default) that ensures that output filenames don't clash. Setting it to 1 instead of 0 solved the problem for me:

Label       1

Note that some of these properties may be prefixed with a hash (#). You'll need to remove this in order for it to take effect.

With the new PDF printer configured, it's time to open it up to our local network. Here's how to do that:

sudo cupsctl --share-printers
sudo lpadmin -p pdf -o printer-is-shared=true

Note that if you want to open it up to more than your local subnet you'll need to do some additional configuration - such as configuring authentication, for instance. Such things are beyond the scope of this blog post, but if there's the demand (comment below!) I can certainly investigate writing something up.

Found this useful? Got a better / different solution? Comment below!

Art by Mythdael