systemquery, part 1: encryption protocols

Unfortunately, my autoplant project is taking longer than I anticipated to setup and debug. In the meantime, I'm going to talk about systemquery - another (not so) little project I've been working on in my spare time.

As I've acquired more servers of various kinds (mostly consisting of Raspberry Pis), I've found myself with an increasing need to get a high-level overview of the status of all the servers I manage. At the moment, this need is satisfied by my monitoring system's (collectd, which while I haven't blogged about my setup directly, I have posted about it here and here) web-based dashboard called Collectd Graph Panel (sadly now abandonware, but still very useful):

This is great and valuable, but if I want to ask questions like "are all apt updates installed", or "what's the status of this service on all hosts?", or "which host haven't I upgraded to Debian bullseye yet?", or "is this mount still working", I currently have to SSH into every host to find the information I'm looking for.

To solve this problem, I discovered the tool osquery. Osquery is a tool to extract information from a network of hosts with an SQL-like queries. This is just what I'm looking for, but unfortunately it does not support the armv7l architecture - which most of my cluster currently runs on - thereby making it rather useless to me.

Additionally, from looking at the docs it seems to be extremely complicated to setup. Finally, it does not seem to have a web interface. While not essential, it's a nice-to-have

To this end, I decided to implement my own system inspired by osquery, and I'm calling it systemquery. I have the following goals:

1. Allow querying all the hosts in the swarm at once
2. Make it dead-easy to install and use (just like Pepperminty Wiki)
3. Make it peer-to-peer and decentralised
4. Make it tolerate random failures of nodes participating in the systemquery swarm
5. Make it secure, such that any given node must first know a password before it is allowed to join the swarm, and all network traffic is encrypted

As a stretch goal, I'd also like to implement a mesh message routing system too, so that it's easy to connect multiple hosts in different networks and monitor them all at once.

Another stretch goal I want to work towards is implementing a nice web interface that provides an overview of all the hosts in a given swarm.

Encryption Protocols

With all this in mind, the first place to start is to pick a language and platform (Javascript + Node.js) and devise a peer-to-peer protocol by which all the hosts in a given swarm can communicate. My vision here is to encrypt everything using a join secret. Such a secret would lend itself rather well to a symmetrical encryption scheme, as it could act as a pre-shared key.

A number of issues stood in the way of actually implementing this though. At first, I thought it best to use Node.js' built-in TLS-PSK (stands for Transport Layer Security - Pre-Shared Key) implementation. Unlike regular TLS which uses asymmetric cryptography (which works best in client-server situations), TLS-PSK uses a pre-shared key and symmetrical cryptography.

Unfortunately, although Node.js advertises support for TLS-PSK, it isn't actually implemented or is otherwise buggy. This not only leaves me with the issue of designing a encryption protocol, but also:

1. The problem of transferring binary data
2. The problem of perfect forward secrecy
3. The problem of actually encrypting the data

Problem #1 here turned out to be relatively simple. I ended up abstracting away a raw TCP socket into a FramedTransport class, which implements a simple protocol that sends and receives messages in the form <length_in_bytes><data....>, where <length_in_bytes> is a 32 bit unsigned integer.

With that sorted and the nasty buffer manipulation safely abstracted away, I could turn my attention to problems 2 and 3. Let's start with problem 3 here. There's a saying when programming things relating to cryptography: never roll your own. By using existing implementations, these existing implementations are often much more rigorously checked for security flaws.

In the spirit of this, I sought out an existing implementation of a symmetric encryption algorithm, and found tweetnacl. Security audited, it provides what looks to be a secure symmetric encryption API, which is the perfect foundation upon which to build my encryption protocol. My hope is that by simply exchanging messages I've encrypted with an secure existing algorithm, I can reduce the risk of a security flaw.

This is a good start, but there's still the problem of forward secrecy to tackle. To explain, perfect forward secrecy is where should an attacker be listening to your conversation and later learn your encryption key (in this case the join secret), they still are unable to decrypt your data.

This is achieved by using session keys and a key exchange algorithm. Instead of encrypting the data with the join secret directly, we use it only to encrypt the initial key-exchange process, which then allows 2 communicating parties to exchange a session key, which used to encrypt all data from then on. By re-running the key-exchange process to and generating new session keys at regular intervals, forward secrecy can be achieved: even if the attacker learns a session key, it does not help them to obtain any other session keys, because even knowledge of the key exchange algorithm messages is not enough to derive the resulting session key.

Actually implementing this in practice is another question entirely however. I did some research though and located a pre-existing implementation of JPAKE on npm: jpake.

With this in hand, the problem of forward secrecy was solved for now. The jpake package provides a simple API by which a key exchange can be done, so then it was just a case of plugging it into the existing system.

Where next?

After implementing an encryption protocol as above (please do comment below if you have any suggestions), the next order of business was to implement a peer-to-peer swarm system where agents connect to the network and share peers with one another. I have the basics of this implemented already: I just need to test it a bit more to verify it works as I intend.

It would also be nice to refactor this system into a standalone library for others to use, as it's taken quite a bit of effort to implement. I'll be holding off on doing this though until it's more stable however, as refactoring it now would just slow down development since it has yet to stabilise as of now.

On top of this system, the plan is to implement a protocol by which any peer can query any other peer for system information, and then create a command-line interface for easily querying it.

To make querying flexible, I plan on utilising some form of in-memory database that is populated with queries to other hosts based on the tables mentioned in the user's query. SQLite3 is the obvious choice here, but I'm reluctant to choose it as it requires compilation upon installation - and given that I have experienced issues with this in the past, I feel this has the potential to limit compatibility with some system configurations. I'm going to investigate some other in-memory database libraries for Javascript - giving preference to those which are both light and devoid of complex installation requirements (pure JS is best if I can manage it I think). If you know of a pre Javascript in-memory database that has a query syntax, do let me know in the comments below!

As for querying system information directly, that's an easy one. I've previously found systeminformation - which seems to have an API to fetch pretty much anything you'd ever want to know about the host system!