Archive

## Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blog bookmarklet booting bug hunting c sharp c++ challenge chrome os code codepen coding conundrums coding conundrums evolved command line compilers compiling compression css dailyprogrammer data analysis debugging demystification distributed computing documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro network networking nibriboard node.js operating systems own your code performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference releases resource review rust searching secrets security series list server software sorting source code control statistics storage svg talks technical terminal textures thoughts three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

## Next Gen Search, Part 1: Backend Storage

I've got a bit of a thing about full-text search engines. I've talked about one in particular before for Pepperminty Wiki, and I was thinking about it again the other day - and how I could optimise it further.

If you haven't already, I do recommend reading my previous post on the curious question - as a number of things in this post might not make sense otherwise.

Between the time I wrote that last post and now, I've done quite a bit more digging into the root causes of that ~450ms search time, and I eventually determined that most of it was actually being spent deserialising the inverted index from the JSON file it's stored in back into memory.

This is horribly inefficient and is actually taking up 90% of that query time, so I got to thinking about what I could do about it.

My solution was multi-faceted, as I also (separately) developed a new search-term analysis system (I abbreviated to STAS, because it sounds cool :D) to add advanced query syntax such as -dog to exclude pages that contain the word dog and the like - which I also folded in at the same time as fixing the existing bottleneck.

As of the time of typing, the work on STAS is still ongoing. This doesn't mean that I can't blog about the rest of it though! I've recently-ish come into contact with key-value data stores, such as Redis and LevelDB (now RocksDB). They work rather like a C♯ dictionary or a browser's localStorage, in that they store values that are associated with unique keys (Redis is a bit of a special case though, for reasons that I won't get into here).

Such a data store would suit an inverted index surprisingly well. I devised a key system to represent an inverted index:

• The first key, |termindex|, is used to store a master list of words that have entries in the page index.
• The second key, term is simply the word itself (e.g. cat, chicken, rocket, etc.), and stores a list of ids of the pages that contain that word.
• The third and final key, term|pageid, is a word followed by a separator and finally the id of a page (e.g. bill|1, cat|45, boosters|69, etc.). These keys store the locations that a particular word appears at in a given document in the index. A separate id index is needed to convert between the page id and it's associated name - Pepperminty Wiki provides this functionality out-of-the-box.

The more I thought about it, the more I liked it. If I use a key-value data store in this manner, I can store the values as JSON objects - and then I only have to deserialise the parts of the index that I actually use. Furthermore, adding this extra layer of abstraction allows for some clever trickery to speed things up even more.

The problem here is that Pepperminty Wiki is supposed to be portable, so I try not to use any heavy external libraries or depend on odd PHP modules that not everyone will have installed. While a LevelDB extension for PHP does exist, it's not installed by default and it's a PECL module, which are awkward to install.

All isn't lost though, because it turns out that SQLite functions surprisingly well as a key-value data store:

CREATE TABLE store (
key TEXT UNIQUE NOT NULL,
value TEXT
);

Yes - it really is that simple! Now all we need is some code to create and interface with such a database. Some simple getters and setters should suffice!

(Can't see the above? Try a direct link.)

While this works, I quickly ran into several major problems:

• Every time I read from the data store I'm decoding JSON, which is expensive
• Every time I'm saving to the data store, I'm encoding to JSON, which is also expensive
• I'm reading and writing the same thing multiple times, which is very expensive
• Writing anything to the data store takes a long time

The (partial) solution here was to refactor it such that the json encoding is handled by the storage provider, and to implement a cache.

Such a cache could just be an associative array:

private $cache = []; Then, to fetch a value, we can do this: // If it's not in the cache, insert it if(!isset($this->cache[$key])) {$this->cache[$key] = [ "modified" => false, "value" => json_decode($this->query(
"SELECT value FROM store WHERE key = :key;",
[ "key" => $key ] )->fetchColumn()) ]; } return$this->cache[$key]["value"]; Notice how each item in the cache is also an associative array. This is so that we can flag items that have been modified in memory, such that when we next sync to disk we can write multiple changes all at once in a single batch. That turns the setter into something like this: if(!isset($this->cache[$key]))$this->cache[$key] = [];$this->cache[$key]["value"] =$value;
$this->cache[$key]["modified"] = true;

Very cool! Now all we need is a function to batch-write all the changes to disk. This isn't hard either:

foreach($this->cache as$key => $value_data) { // If it wasn't modified, there's no point in saving it, is there? if(!$value_data["modified"])
continue;

$this->query( "INSERT OR REPLACE INTO store(key, value) VALUES(:key, :value)", [ "key" =>$key,
"value" => json_encode($value_data["value"]) ] ); } I'll get onto the cogs and wheels behind that query() function a bit later in this post. It's one of those utility functions that are great to have around that I keep copying from one project to the next. Much better, but still not great. Why does it still take ages to write all the changes to disk? Well, it turns out that by default SQLite wraps every INSERT in it's own transaction. If we wrap our code in an explicit transaction, we can seriously boost the speed even further: $this->db->beginTransaction();

// Do batch writing here

$this->db->commit(); Excellent (boomed the wizard). But wait, there's more! The PHP PDO database driver supports prepared statements, which is a fancy way of caching SQL statements and substituting values into them. We use these already, but since we only use a very limited number of SQL queries, we can squeak some additional performance out by caching them in their prepared forms (preparing them is relatively computationally expensive, after all). This is really easy to do. Let's create another associative array to store them in: private$query_cache = [];

Then, we can alter the query() function to look like this:

/**
* Makes a query against the database.
* @param   string  $sql The (potentially parametised) query to make. * @param array$variables  Optional. The variables to substitute into the SQL query.
* @return  \PDOStatement       The result of the query, as a PDOStatement.
*/
private function query(string $sql, array$variables = []) {
// Add to the query cache if it doesn't exist
if(!isset($this->query_cache[$sql]))
$this->query_cache[$sql] = $this->db->prepare($sql);
$this->query_cache[$sql]->execute($variables); return$this->query_cache[$sql]; // fetchColumn(), fetchAll(), etc. are defined on the statement, not the return value of execute() } If a prepared statement for the given SQL exists in the query cache already, we re-use that again instead of preparing a brand-new one. This works especially well, since we perform a large number of queries with the same small set of SQL queries. Get operations all use the same SQL query, so why not cache it? This completes our work on the backend storage for the next-generation search system I've built for Pepperminty Wiki. Not only will it boost the speed of Pepperminty Wiki's search engine, but it's also a cool reusable component, which I can apply to other applications to give them a boost, too! Next time, we'll take a look at generating the test data required to stress-test the system. I also want to give an overview of the new search-term analysis system and associated changes to the page ranking system I implemented (and aspire to implement) too, but those may have to wait until another post (teaser: PageRank isn't really what I'm after). (Can't see the above? Try a direct link.) ## Generating Atom 1.0 Feeds with PHP (the proper way) I've generated Atom feeds in PHP before, but recently I went on the hunt to discover if PHP has something like C♯'s XMLWriter class - and it turns out it does! Although poorly documented (probably why I didn't find it in the first place :P), it's actually quite logical and easy to pick up. To this end, I thought I'd blog about how I used to write the Atom 1.0 feed generator for recent changes on Pepperminty Wiki that I've recently implemented (coming soon in the next release!), as it's so much cleaner than atom.gen.php that I blogged about before! It's safer too - as all the escaping is handled automatically by PHP - so there's no risk of an injection attack because I forgot to escape a character in my library code. It ends up being a bit verbose, but a few helper methods (or a wrapper class?) should alleviate this nicely - I might refactor it at a later date. To begin, we need to create an instance of the aptly-named XMLLWriter class. It's probable that you'll need the php-xml package installed in order to use this. $xml = new XMLWriter();
$xml->openMemory();$xml->setIndent(true); $xml->setIndentString("\t");$xml->startDocument("1.0", "utf-8");

In short, the above creates a new XMLWriter instance, instructs it to write the XML to memory, enables pretty-printing of the output, and writes the standard XML document header.

With the document created, we can begin to generate the Atom feed. To figure out the format (I couldn't remember from when I wrote atom.gen.php - that was ages ago :P), I ended following this helpful guide on atomenabled.org. It seems familiar - I think I might have used it when I wrote atom.gen.php. To start, we need something like this:

<feed xmlns="http://www.w3.org/2005/Atom">
......
</feed>

In PHP, that translates to this:

$xml->startElement("feed");$xml->writeAttribute("xmlns", "http://www.w3.org/2005/Atom");

$xml->endElement(); // </feed> Next, we probably want to advertise how the Atom feed was generated. Useful for letting the curious know what a website is powered by, and for spotting usage of your code out in the wild! Since I'm writing this for Pepperminty Wiki, I'm settling on something not unlike this: <generator uri="https://github.com/sbrl/Pepperminty-Wiki/" version="v0.18-dev">Pepperminty Wiki</generator> In PHP, this translates to this: $xml->startElement("generator");
$xml->writeAttribute("uri", "https://github.com/sbrl/Pepperminty-Wiki/");$xml->writeAttribute("version", $version); // A variable defined elsewhere in Pepperminty Wiki$xml->text("Pepperminty Wiki");
$xml->endElement(); Next, we need to add a <link rel="self" /> tag. This informs clients as to where the feed was fetched from, and the canonical URL of the feed. I've done this: xml->startElement("link");$xml->writeAttribute("rel", "self");
$xml->writeAttribute("type", "application/atom+xml");$xml->writeAttribute("href", full_url());
$xml->endElement(); That full_url() function is from StackOverflow, and calculates the full URI that was used to make a request. As Pepperminty Wiki can be run in any directory on ayn server, I can't pre-determine this url - hence the complexity. Note also that I output type="application/atom+xml" here. This specifies the type of content that can be found at the supplied URL. The idea here is that if you represent the same data in different ways, you can advertise them all in a standard way, with other formats having rel="alternate". Pepperminty Wiki does this - generating the recent changes list in HTML, CSV, and JSON in addition to the new Atom feed I'm blogging about here (the idea is to make the wiki data as accessible and easy-to-parse as possible). Let's advertise those too: $xml->startElement("link");
$xml->writeAttribute("rel", "alternate");$xml->writeAttribute("type", "text/html");
$xml->writeAttribute("href", "$full_url_stem?action=recent-changes&format=html");
$xml->endElement();$xml->startElement("link");
$xml->writeAttribute("rel", "alternate");$xml->writeAttribute("type", "application/json");
$xml->writeAttribute("href", "$full_url_stem?action=recent-changes&format=json");
$xml->endElement();$xml->startElement("link");
$xml->writeAttribute("rel", "alternate");$xml->writeAttribute("type", "text/csv");
$xml->writeAttribute("href", "$full_url_stem?action=recent-changes&format=csv");
$xml->endElement(); Before we can output the articles themselves, there are a few more pieces of metadata left on our laundry list - namely <updated />, <id />, <icon />, <title />, and <subtitle />. There are others in the documentation too, but aren't essential (as far as I can tell) - and not appropriate in this specific case. Here's what they might look like: <updated>2019-02-02T21:23:43+00:00</updated> <id>https://wiki.bobsrockets.com/?action=recent-changes&amp;format=atom</id> <icon>https://wiki.bobsrockets.com/rocket_logo.png</icon> <title>Bob's Wiki - Recent Changes</title> <subtitle>Recent Changes on Bob's Wiki</subtitle> The <updated /> tag specifies when the feed was last updated. It's unclear as to whether it's the date/time the last change was made to the feed or the date/time the feed was generated, so I've gone with the latter. If this isn't correct, please let me know and I'll change it. The <id /> element can contain anything, but it must be a globally-unique string that identifies this feed. I've seen other feeds use the canonical url - and I've gone to the trouble of calculating it for the <link rel="self" /> - so it seems a shame to not use it here too. The remaining elements (<icon />, <title />, and <subtitle />) are pretty self explanatory - although it's worth mentioning that the icon must be square apparently. Let's whip those up with some more PHP: $xml->writeElement("updated", date(DateTime::ATOM));
$xml->writeElement("id", full_url());$xml->writeElement("icon", $settings->favicon);$xml->writeElement("title", "$settings->sitename - Recent Changes");$xml->writeElement("subtitle", "Recent Changes on $settings->sitename"); PHP even has a present for generating a date string in the correct format required by the spec :D $settings is an object containing the wiki settings that's a parsed form of peppermint.json, and contains useful things like the wiki's name, icon, etc.

Finally, with all the preamble done, we can turn to the articles themselves. In the case of Pepperminty Wiki, the final result will look something like this:

<entry>
<title type="text">Edit to Compute Core by Sean</title>
<id>https://seanssatellites.co.uk/wiki/?page=Compute%20Core</id>
<updated>2019-01-29T10:21:43+00:00</updated>
<content type="html">&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Change type:&lt;/strong&gt; edit&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;User:&lt;/strong&gt;  Sean&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Page name:&lt;/strong&gt; Compute Core&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Timestamp:&lt;/strong&gt; Tue, 29 Jan 2019 10:21:43 +0000&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;New page size:&lt;/strong&gt; 1.36kb&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Page size difference:&lt;/strong&gt; +1&lt;/li&gt;
&lt;/ul&gt;</content>
<author>
<name>Sean</name>
<uri>https://seanssatellites.co.uk/wiki/?page=Users%2FSean</uri>
</author>
</entry>

There are a bunch of elements here that deserve attention:

• <title /> - The title of the article. Easy peasy!
• <id /> - Just like the id of the feed itself, each article entry needs an id too. Here I've followed the same system I used for the feed, and given the url of the page content.
• <updated /> - The last time the article was updated. Since this is part of a feed of recent changes, I've got this information readily at hand.
• <content /> - The content to display. If the content is HTML, it must be escaped and type="html" present to indicate this.
• <link rel="alternate" /> Same deal as above, but on an article-by-article level. In this case, it should link to the page the article content is from. In this case, I link to the page & revision of the change in question. In other cases, you might link to the blog post in question for example.
• <author /> - Can contain <name />, <uri />, and <email />, and should indicate the author of the content. In this case, I use the name of the user that made the change, along with a link to their user page.

Here's all that in PHP:

foreach($recent_changes as$recent_change) {
if(empty($recent_change->type))$recent_change->type = "edit";

$xml->startElement("entry"); // Change types: revert, edit, deletion, move, upload, comment$type = $recent_change->type;$url = "$full_url_stem?page=".rawurlencode($recent_change->page);

$content = ".......";$xml->startElement("title");
$xml->writeAttribute("type", "text");$xml->text("$type$recent_change->page by $recent_change->user");$xml->endElement();

$xml->writeElement("id",$url);
$xml->writeElement("updated", date(DateTime::ATOM,$recent_change->timestamp));

$xml->startElement("content");$xml->writeAttribute("type", "html");
$xml->text($content);
$xml->endElement();$xml->startElement("link");
$xml->writeAttribute("rel", "alternate");$xml->writeAttribute("type", "text/html");
$xml->writeAttribute("href",$url);
$xml->endElement();$xml->startElement("author");
$xml->writeElement("name",$recent_change->user);
$xml->writeElement("uri", "$full_url_stem?page=".rawurlencode("$settings->user_page_prefix/$recent_change->user"));
$xml->endElement();$xml->endElement();
}

I've omitted the logic that generates the value of the <content /> tag, as it's not really relevant here (you can check it out here if you're curious :D).

This about finishes the XML we need to generate for our feed. To extract the XML from the XMLWriter, we can do this:

$atom_feed =$xml->flush();

Then we can do whatever we want to with the generated XML!

When the latest version of Pepperminty Wiki comes out, you'll be able to see a live demo here! Until then, you'll need to download a copy of the latest master version and experiment with it yourself. I'll also include a complete demo feed below:

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<generator uri="https://github.com/sbrl/Pepperminty-Wiki/" version="v0.18-dev">Pepperminty Wiki</generator>
<updated>2019-02-03T17:25:10+00:00</updated>
<id>http://[::]:35623/?action=recent-changes&amp;format=atom&amp;count=3</id>
<title>Pepperminty Wiki - Recent Changes</title>
<subtitle>Recent Changes on Pepperminty Wiki</subtitle>
<entry>
<updated>2019-01-29T19:55:08+00:00</updated>
<content type="html">&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Change type:&lt;/strong&gt; edit&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Timestamp:&lt;/strong&gt; Tue, 29 Jan 2019 19:55:08 +0000&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;New page size:&lt;/strong&gt; 2.11kb&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Page size difference:&lt;/strong&gt; +2007&lt;/li&gt;
&lt;/ul&gt;</content>
<author>
</author>
</entry>
<entry>
<title type="text">Edit to Main Page by admin</title>
<id>http://[::]:35623/?page=Main%20Page</id>
<updated>2019-01-05T20:14:07+00:00</updated>
<content type="html">&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Change type:&lt;/strong&gt; edit&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Page name:&lt;/strong&gt; Main Page&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Timestamp:&lt;/strong&gt; Sat, 05 Jan 2019 20:14:07 +0000&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;New page size:&lt;/strong&gt; 317b&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Page size difference:&lt;/strong&gt; +68&lt;/li&gt;
&lt;/ul&gt;</content>
<author>
<uri>http://[::]:35623/?page=Users%2FMain%20Page</uri>
</author>
</entry>
<entry>
<title type="text">Edit to Main Page by admin</title>
<id>http://[::]:35623/?page=Main%20Page</id>
<updated>2019-01-05T17:53:08+00:00</updated>
<content type="html">&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Change type:&lt;/strong&gt; edit&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Page name:&lt;/strong&gt; Main Page&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Timestamp:&lt;/strong&gt; Sat, 05 Jan 2019 17:53:08 +0000&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;New page size:&lt;/strong&gt; 249b&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Page size difference:&lt;/strong&gt; +31&lt;/li&gt;
&lt;/ul&gt;</content>
<author>
<uri>http://[::]:35623/?page=Users%2FMain%20Page</uri>
</author>
</entry>
</feed>

...this is from my local development instance.

Found this interesting? Confused about something? Want to say hi? Comment below!

## Using libsodium to upgrade the comment key system

I've blogged about the comment key system I utilise on this blog to prevent spam before (see also). Today, I've given it another upgrade to make it harder for spammers to fake a comment key!

In the last post, I transformed the comment key with a number of reversible operations - including a simple XOR password system. This is, of course, very insecure - especially since an attacker knows (or can at least guess) the content of the encrypted key, making it trivial (I suspect) to guess the password used for 'encryption'.

The solution here, obviously, is to utilise a better encryption system. Since it's the 5th November and I'm not particularly keen on my website going poof like the fireworks tonight, let's do something about it! PHP 7.2+ comes with native libsodium support (those still using older versions of PHP can still follow along! Simply install the PECL module). libsodium bills itself as

A modern, portable, easy to use crypto library.

After my experiences investigating it, I'd certainly say that's true (although the official PHP documentation could do with, erm, existing). I used this documentation instead instead - it's quite ironic because I have actually base64-encoded the password.......

Anyway, after doing some digging I found the quick reference, which explains how you can go about accomplishing common tasks with libsodium. For my comment key system, I want to encrypt my timestamp with a password - so I wanted the sodium_crypto_secretbox() and its associated sodium_crypto_secretbox_open() functions.

This pair of functions, when used together, implement a secure symmetric key encryption system. In other words, they securely encrypt a lump of data with a password. They do, however, have 2 requirements that must be taken care of. Firstly, the password must be of a specific length. This is easy enough to accomplish, as PHP is kind enough to tell us how long it needs to be:

/**
* Generates a new random comment key system password.
* @return string   The base64-encoded password.
*/
return base64_encode_safe(random_bytes(SODIUM_CRYPTO_SECRETBOX_KEYBYTES));
}

Easy-peasy! base64_encode_safe isn't a built-in function - it's a utility function I wrote that we'll need later. For consistency, I've used it here too. Here it is, along with its counterpart:

/**
* Encodes the given data with base64, but replaces characters that could
* get mangled in transit with safer equivalents.
* @param   mixed   $data The data to encode. * @return string The data, encoded as transmit-safe base64. */ function base64_encode_safe($data) {
return str_replace(["/", "+"], ["-", "_"], base64_encode($data)); } /** * Decodes data encoded with base64_encode_safe(). * @param mixed$base64     The data to decode.
* @return  string  The data, decoded from transmit-safe base64.
*/
function base64_decode_safe($base64) { return base64_decode(str_replace(["-", "_"], ["/", "+"],$base64));
}

With that taken care of, we can look at the other requirement: a nonce. Standing for Number used ONCE, it's a sequence of random bytes that's used by the encryption algorithm. We don't need to keep it a secret, but we do need to to decrypt the data again at the other end, in addition to the password - and we do need to ensure that we generate a new one for every comment key. Thankfully, this is also easy to do in a similar manner to generating a password:

$nonce = random_bytes(SODIUM_CRYPTO_SECRETBOX_NONCEBYTES); With everything in place, we can look at upgrading the comment key generator itself. It looks much simpler now: /** * Generates a new comment key. * Note that the 'encryption' used is not secure - it's just simple XOR! * It's mainly to better support verification in complex setups and * serve as a nother annoying hurdle for spammers. * @param string$pass The password to encrypt with. Should be a base64-encoded string from key_generate_password().
* @return string       A new comment key stamped at the current time.
*/
function key_generate($pass) {$pass = base64_decode_safe($pass);$nonce = random_bytes(SODIUM_CRYPTO_SECRETBOX_NONCEBYTES);
$encrypt = sodium_crypto_secretbox( strval(time()), // The thing we want to encrypt$nonce, // The nonce
$pass // The password to encrypt with ); return base64_encode_safe($nonce.$encrypt); } I bundle the nonce with the encrypted data here to ensure that the system continues to be stateless (i.e. we don't need to store any state information about a user on the server). I also encode the encrypted string with base64, as the encrypted strings contain lots of nasty characters (it's actually a binary byte array I suspect). This produces keys like this: BOqDvr26XY9s8PhlmIZMIp6xCOZyfsh6J05S4Jp2cY3bL8ccf_oRgRMldrmzKk6RrnA= Tt8H81tkJEqiJt-RvIstA_vz13LS8vjLgriSAvc1n5iKwHuEKjW93IMITikdOwr5-NY= 5NPvHg-l1GgcQ9ZbThZH7ixfKGqAtSBr5ggOFbN_wFRjo3OeJSWAcFNhQulYEVkzukI= They are a bit long, but not unmanageable. In theory, I could make it a bit shorter by avoiding converting the integer output from time() to a string, but in my testing it didn't make much difference. I suspect that there's some sort of minimum length to the output string for some (probably good) reason.  php > var_dump(sodium_crypto_secretbox(time(), random_bytes(24), random_bytes(32))); string(26) "GW$:���ߌ@]�+1b��������d%"
php > var_dump(sodium_crypto_secretbox(strval(time()), random_bytes(24), random_bytes(32)));
string(26) ":_}0H �E°9��Kn�p��ͧ��"


Now that we're encrypting comment keys properly, it isn't much good unless we can decrypt them as well! Let's do that too. The first step is to decode the base64 and re-split the nonce from the encrypted data:

$pass = base64_decode_safe($pass);
$key_enc_raw = base64_decode_safe($key);
$nonce = mb_substr($key_enc_raw, 0, SODIUM_CRYPTO_SECRETBOX_NONCEBYTES, "8bit");
$key_enc = mb_substr($key_enc_raw, SODIUM_CRYPTO_SECRETBOX_NONCEBYTES, null, "8bit");

This is derived from the example code. With the 2 separated once more, we can decrypt the key:

$key_dec = sodium_crypto_secretbox_open($key_enc, $nonce,$pass);

Apparently, according to the example code on the website I linked to above, if the return value isn't a string then the decryption failed. We should make sure we handle that when returning:

if(!is_string($key_dec)) return null; return intval($key_dec);

That completes the decryption code. Here is in full:

/**
* Decodes the given key.
* @param  string $key The key to decode. * @param string$pass The password to decode it with.
* @return int|null     The time that the key was generated, or null if the provided key was invalid.
*/
function key_decode($key,$pass) {
$pass = base64_decode_safe($pass);
$key_enc_raw = base64_decode_safe($key);
$nonce = mb_substr($key_enc_raw, 0, SODIUM_CRYPTO_SECRETBOX_NONCEBYTES, "8bit");
$key_enc = mb_substr($key_enc_raw, SODIUM_CRYPTO_SECRETBOX_NONCEBYTES, null, "8bit");
$key_dec = sodium_crypto_secretbox_open($key_enc, $nonce,$pass);
if(!is_string($key_dec)) return null; return intval($key_dec);
}

Explicitly returning null in key_decode() requires a small change to key_verify(), in order to prevent it from thinking that a key is really old if the decryption fails (null is treated as 0 in arithmetic operations, apparently). Let's update key_verify() to handle that:

/**
* Verifies a key.
* @param   string  $key The key to decode and verify. * @param string$pass       The password to decode the key with.
* @param   int     $min_age The minimum required age for the key. * @param int$max_age    The maximum allowed age for the key.
* @return  bool                Whether the key is valid or not.
*/
function key_verify($key,$pass, $min_age,$max_age) {
$key_time = key_decode($key, $pass); if($key_time == null) return false;
$age = time() -$key_time;
return $age >=$min_age && $age <=$max_age;
}

A simple check is all that's needed. With that, the system is all updated! Time to generate a new password with the provided function and put it to use. You can do that directly from the PHP REPL (php -a):


php > require("lib/comment_system/comment_key.php");


Obviously, this isn't the password I'm using on my blog :P

I've got a few more ideas I'd like to play around with to deter spammers even more whilst maintaining the current transparency - so you may see another post soon on this subject :-)

With everything complete, here's the complete script:

(Above: The full comment key system code. Can't see it? Check it out on GitHub Gist here.)

Found this useful? Got an improvement? Confused about something? Comment below!

## Placeholder Image Generator

(Above: An example output with debugging turned on from my placeholder generation service)

For a quite a considerable amount of time now, I've been running my own placeholder image generation service here at starbeamrainbowlabs.com - complete with etag generation and custom colour selection. Although it's somewhat of an Easter Egg, it's not actually that hard to find if you know what you're looking for (hint: I use it on my home page, but you may need to view source to find it).

I decided to post about it now because I've just finished fixing the angle GET parameter - and it was interesting (and hard) enough to warrant a post to remind myself how I did it for future reference. Comment below if you knew about it before this blog post came out!

The script itself is split into 3 loose parts:

• The initial settings / argument parsing
• The polyfills and utility functions
• The image generation.

My aim here was to keep the script contained within a single file - so that it fits well in a gist (hence why it currently looks a bit messy!). It's probably best if show the code in full:

(Having trouble viewing code above? Visit it directly here: placeholder.php)

If you append ?help to the url, it will send back a plain text file detailing the different supported parameters and what they do:

(Can't see the above? View it directly here)

Aside from implementing the random value for fg_colour and bg_colour, the angle has been a huge pain. I use GD - a graphics drawing library that's bundled with practically every PHP installation ever - to draw the placeholder image, and when you ask it to draw some rotated text, it decides that it's going to pivot it around the bottom-left corner instead of the centre.

Naturally, this creates some complications. Though some people on the PHP manual page said method (imagettftext) have attempetd to correct for this (exhibits a and b), neither of their solutions work for me. I don't know if it's because their code is just really old (13 and 5 years respectively as of the time of typing) or what.

Anyway, I finally decided recently that enough was enough - and set about fixing it myself. Basing my code on the latter of the 2 pre-existing solutions, I tried fixing it - but ended up deleting most of it and starting again. It did give me some idea as to how to solve the problem though - all I needed to do was find the centre of where the text would be drawn when it is both not rotated and rotated and correct for it (these are represented by the green and blue crosses respectively on the image at the top of this post).

PHP does provide a method for calculating the bounding box of some prospective text that you're thinking of drawing via imagettfbbox. Finding the centre of the box though sounded like a horrible maths-y problem that would take me ages to work through on a whiteboard first, but thankfully (after some searching around) Wikipedia had a really neat solution for finding the central point of any set of points.

It calls it the centroid, and claims that the geometric centre of a set of points is simply the average of all the points involved. It just so happens that the geometric centre is precisely what I'm after:

$$C=\frac{a+b+c+d+....}{N}$$

...Where $C$ is the geometric centre of the shape as an $\binom{x}{y}$ vector, $a$, $b$, $c$, $d$, etc. are the input points as vectors, and $N$ is the number of points in question. This was nice and easy to program:

$bbox = imagettfbbox($size, 0, $fontfile,$text);
$orig_centre = [ ($bbox[0] + $bbox[2] +$bbox[4] + $bbox[6]) / 4, ($bbox[1] + $bbox[3] +$bbox[5] + $bbox[7]) / 4 ];$text_width = $bbox[2] -$bbox[0];
$text_height =$bbox[3] - $bbox[5];$bbox = imagettfbbox($size,$angle, $fontfile,$text);
$rot_centre = [ ($bbox[0] + $bbox[2] +$bbox[4] + $bbox[6]) / 4, ($bbox[1] + $bbox[3] +$bbox[5] + $bbox[7]) / 4 ]; The odd and even indexes of $bbox there are referring to the $x$ and $y$ co-ordinates of the 4 corners of the bounding boxes - imagettfbbox outputs the co-ordinates in 1 single array for some reason. I also calculate the original width and height of the text - in order to perform an additional corrective translation later.

With these in hand, I could calculate the actual position I need to draw it (indicated by the yellow cross in the image at the top of this post):

$$delta = C_{rotated} - C_{original}$$

.

$$pivot = \frac{pos_x - ( delta_x + \frac{text_width}{2})}{pos_y - delta_y + \frac{text_height}{2}}$$

In short, I calculate the distance between the 2 centre points calculated above, and then find the pivot point by taking this distance plus half the size of the text from the original central position we wanted to draw the text at. Here's that in PHP:

// Calculate the x and y shifting needed
$dx =$rot_centre[0] - $orig_centre[0];$dy = $rot_centre[1] -$orig_centre[1];

// New pivot point
$px =$x - ($dx +$text_width/2);
$py =$y - $dy +$text_height/2;

I generated a video of it in action (totally not because I thought it would look cool :P) with a combination of curl, and ffmpeg:

This was really quite easy to generate. In a new folder, I executed the following:

echo {0..360} | xargs -P3 -d' ' -n1 -I{} curl 'https://starbeamrainbowlabs.com/images/placeholder/?width=1920&height=1920&fg_colour=inverse&bg_colour=white&angle={}' -o {}.jpeg
ffmpeg -r 60 -i %d.jpeg -q:v 3 /tmp/text-rotation.webm

The first command downloads all the required frames (3 at a time), and the second stitches them together. The -q:v 3 bit of the ffmpeg command is of note - by default webm videos apparently have a really low quality - this corrects that. Lower is better, apparently - it goes up to about 40 I seem to remember reading somewhere. 3 to 5 is supposed to be the right range to get it to look ok without using too much disk space.

That's about everything I wanted to talk about to remind myself about what I did. Let me know if there's anything you're confused about in the comments below, and I'll explain it in more detail.

(Found this interesting? Comment below!)

## Search Engine Optimisation: The curious question of efficiency

For one reason or another I found myself a few days ago inspecting the code behind Pepperminty Wiki's full-text search engine. What I found was interesting enough that I thought I'd blog about it.

Forget about that kind of Search Engine Optimisation (the horrible click-baity kind - if there's enough interest I'll blog about my thoughts there too) and cue the appropriate music - we're going on a field trip fraught with the perils of Unicode, page ids, transliteration, and more!

Firstly, I should probably mention a little about the starting point. The (personal) wiki that is exhibiting the slowness has 546 ~75K words spread across 546 pages. Pepperminty Wiki manages to search all of this in about 2.8 seconds by way of an inverted index. If you haven't read my last post, you should do so now - it sets the stage for this one - and you'll be rather confused otherwise.

2.8 seconds is far too slow though. Let's do something about it! In order to do something about it, there are several other things that need explaining before I can show you what I did to optimise it. Let's look at Pepperminty Wiki's search system first. It's best explained with the aid of a diagram:

In short, every page has a numerical id, which is tracked by the ids core class. The search system interacts with this during the indexing phase (that's a topic for another blog post) and during the query phase. The query phase works something like this:

1. The inverted index is loaded from disk in my personal wiki the inverted index is ~968k, and loads in ~128ms)
2. The inverted index is queried for pages in that match the tokenised query terms.
3. The results returned from the query are ranked and sorted before being returned.
4. A context is extracted from the source of each page in the results returned - just like Duck Duck Go or Google have a bit of text below the title of each result
5. Said context has the search terms hightlighted

It sounds complicated, but it really isn't. The complicated bit comes when I tried to optimise it. To start with, I analysed how long each of the above steps were taking. The results were quite surprising:

• Step #1 took ~128ms on average
• Steps #2 & #3 took ~1200ms on average
• Step #4 took ~1500ms on average(!)
• Step #5 took a negligible amount of time

I did this by setting headers on the returned page. Timing things in PHP is relatively easy:

$start_time = microtime(true); // Do work here$end_time = microtime(true);

$time_taken_ms = round(($end_time - $start_time )*1000, 3); This gave me a general idea as to what needed attention. I was surprised to learn that the context extractor was taking most of the time. At first, I thought that my weird and probably inefficient algorithm was to blame. There's no way it should be taking 1500ms! So I set to work rewriting it to make it more optimal. Firstly, I tried something like this. Instead of multiple sub-loops, I figured out a way to do it with just 1 for loop and a few calls to mb_stripos_all(). Unfortunately, this did not have the desired effect. While it did shave about 50ms off the total time, it was far from what I'd hoped for. I tried refactoring it slightly again to use preg_match_all(), but it still didn't give me the speed boost I was after - only another 50ms or so. To get some answers, I brought out the big guns and profiled it with XDebug. Upon analysing the generated profile it immediately became clear what the issue was: transliteration. Transliteration is the process of removing the diacritics and other accents from a string to make it easier to compare with other strings. For example, café becomes Café. In PHP this process is a bit funky. Here's what I do in Pepperminty Wiki: $literator = Transliterator::createFromRules(':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: Lower(); :: NFC;', Transliterator::FORWARD);

$literator->transliterate($string);

Note that this requires the intl PHP extension (which should be installed & enabled by default). This does several things:

• Converts the text to lowercase
• Casts Cyrillic characters to their Latin alphabet (i.e. a-z) phonetic equivalent
• Removes all diacritics

In short, it preprocesses a chunk of text so that it can be easily used by the search system. In my case, I transliterate search queries before tokenising them, source texts before indexing them, and crucially: source texts before extracting contextual information.

The thing about this wonderful transliteration process is that, at least in PHP, it's really slow. Thinking about it, the answer was obvious. Why bother extract offset information when the inverted index already contains that information?

The answer is: you don't upon refactoring the context extractor to utilise the inverted index, I managed to get it down to just ~59ms. Success!

Next up was the query system itself. 1200ms seems a bit high, so while I was at it, I analysed a profile of that as well. It turned out that a similar problem was occurring here too. Surprisingly, the page id system's getid($pagename) function was being really slow. I found 2 issues here. Firstly, I was doing too much Unicode normalisation. In the page id system, I don't want to transliterate to remove diacritics, but I do want to make sure that all the diacritics and accents are represented in the same way. If you didn't know, Unicode has a both a character for letters like é (e-acute), and a code-point for the acute accent itself, which gets merged into the previous letter during rendering. This can cause a page to acquire 2 (or even more!) seemingly identical ids in the system, which caused me a few headaches in the past! If you'd like to learn more, the article on Unicode normalisation I linked to above explains it in some detail. Thankfully, the solution is quite simple. Here's what Pepperminty Wiki does: Normalizer::normalize($string, Normalizer::FORM_C)

This ensures that all accents and other weird characters are represented in the same way. As you might guess though, it's slow. I found that in the getid() function I was normalising both the page names I was iterating over in the index, as well as the target page name to find in every loop. The solution here was simple:

• Don't normalise the page names from the index - that's the job of the assign() protected method to ensure that they are suitably normalised when adding them in the first place
• Normalise the target page name only once, and then use that when spinning through the index.

Implementing these simple changes brought the overall search time down to 700ms. The other thing to note here is the structure of the index. I show it in the diagram, but here it is again:

• 1: Booster
• 2: Rocket
• 3: Satellite

The index is basically a hash-table mapping numerical ids to their page names. This is great for when you have an id and want to know what the name of the page associated with it is, but terrible for when you want to go in the other direction, as we need to do when performing a query!

I haven't quite decided what to do about this. Obviously, the implications on efficiency are significant whenever we need to convert a page name into its respective numerical id. The problem lies in the fact that the search query system travels in both directions: It needs to convert page ids into page names when unravelling the results from the inverted index, but it also needs to convert page names into their respective ids when searching the titles and tags in the page index (the index that contains information about all the pages on a wiki - not pictured in the diagram above).

I have several options that I can see immediately:

• Maintain 2 indexes: One going in each direction. This would also bring a minor improvement to indexing new and updating existing content in the inverted index.
• Use some fancy footwork to refactor the search query system to unwind the page ids into their respective page names before we search the pages' titles and tags.

While deciding what to do, I did manage to reduce the number of times I convert a page name into its respective id by only performing the conversion if I find a match in the page metadata. This brought the average search time down to ~455ms, which is perfectly fine for my needs at the moment.

In the future, I may come back to this and optimise it further - but as it stands I'm getting to the point of diminishing returns: Where every additional optimisation requires twice the amount of time to implement as the last, and only provides a marginal gain in speed.

To this end, it doesn't seem worth it to spend ages tackling this issue now. Pepperminty Wiki is written in such a way that I can come back later and turn the inner workings of any part of the system upside-down, and it doesn't have any effect on the rest of the system (most of the time, anyway.... :P).

If you do find the search system too slow after these optimisations are released in v0.17, I'd like to hear about it! Please open an issue and I'll investigate further - I certainly haven't reached the end of this particular lollipop.

Found this interesting? Learnt something? Got a better way of doing it? Comment below!

## How to prevent php-fpm from overriding your PHP-based caching logic

A while ago I implemented ETag support to the dynamic preview generator in Pepperminty Wiki. While I thought it worked during testing, for some reason on a private instance of Pepperminty Wiki I discovered recently that my browser didn't seen to be taking advantage of it. This got me curious, so I decided to do a little bit of digging to find out why.

It didn't take long to find the problem. For some reason, all the responses from the server had a Cache-Control: no-cache, no-store, must-revalidate header attached to them. How strange! Even more so that it was in capital letters - my convention in Pepperminty Wiki is to always make the headers lowercase.

I checked the codebase with via the Project Find feature of Atom just to make sure that I hadn't left in a debugging statement or anything (I hadn't), and then I turned my attention to Nginx (engine-x) - the web server that I use on my server. Maybe it had added a caching header?

A quick grep later revealed that it wasn't responsible either - which leaves just one part of the system unchecked - php-fpm, the PHP FastCGI server that sits just behind Nginx that's responsible for running the various PHP scripts I have that power this website and other corners of my server. Another quick grep returned a whole bunch of garbage, so after doing some research I discovered that php-fpm, by default, is configured to send this header - and that it has to be disabled by editing your php.ini (for me it's in /etc/php/7.1/fpm/php.ini), and changing

;session.cache_limiter = nocache

to be explicitly set to an empty string, like so:

session.cache_limiter = ''

This appears to have solved by problem for now - allowing me to regain control over how the resources I send back via PHP are cached. Hopefully this saves someone else the hassle of pulling their entire web server stack apart and putting it back together again the future :P

## Deterring spammers with a comment key system

I recently found myself reimplementing the comment key system I use on this blog (I posted about it here) for a another purpose. Being more experienced now, my new implemention (which I should really put into use on this blog actually) is stand-alone in a separate file - so I'm blogging about it here both to help out anyone who reads this other than myself - and for myself as I know I'll forget otherwise :P

The basic algorithm hasn't changed much since I first invented it: take the current timestamp, apply a bunch or arbitrary transformations to it, put it as a hidden field in the comment form, and then reverse the transformations on the comment key the user submits as part of the form to discover how long they had the page loaded for. Bots will have it loaded for either less than 10-ish seconds, or more than 24 hours. Humans will be somewhere in the middle - at least according to my observations!

Of course, any determined spammer can easily bypass this system if they spend even a little bit of time analysing the system - but I'm banking on the fact that my blog is too low-value for a spammer to bother reverse-engineering my system to figure out how it works.

This time, I chose to use simple XOR encryption, followed by reversing the string, followed by base64 encoding. It should be noted that XOR encryption is certainly not secure - but in this case it doesn't really matter. If my website becomes a high-enough value target for that to matter, I'll investigate proper AES encryption - which will probably be a separate post in and of itself, as a quick look revealed that it's rather involved - and will probably require quite a bit of research and experimentation working correctly.

Let's take a look at the key generation function first:

function key_generate($pass) {$new_key = strval(time());
// Repeat the key so that it's long enough to XOR the key with
$pass_enc = str_repeat($pass, (strlen($new_key) / strlen($pass)) + 1);
$new_key =$new_key ^ $pass_enc; return base64_encode(strrev($new_key));
}

As I explained above, this first XORs the timestamp against a provided 'passcode' of sorts, and then it reverses it, base64 encodes it, and then returns it. I discovered that I needed to repeat the passcode to make sure it's at least as long as the timestamp - because otherwise it cuts the timestamp short! Longer passwords are always desirable for certain, but I wanted to make sure I addressed it here - just in case I need to lift this algorithm from here for a future project.

Next up is the decoding algorithm, that reverses the transformations we apply above:

    function key_decode($key,$pass) {
$key_dec = strrev(base64_decode($key));
// Repeat the key so that it's long enough to XOR the key with
$pass_dec = str_repeat($pass, (strlen($key_dec) / strlen($pass)) + 1);
return intval($key_dec ^$pass_dec);
}

Very similar. Again, the XOR passphrase has to be repeated to make it long enough to apply to the whole encoded key without inadvertently chopping some off the end. Additionally, we also convert the timestamp back into an integer - since it is the number of seconds since the last UNIX epoch (1st January 1970 as of the time of typing).

With the ability to create and decode keys, let's write a helper method to make the verification process a bit easier:

    function key_verify($key,$pass, $min_age,$max_age) {
$age = time() - key_decode($key, $pass); return$age >= $min_age &&$age <= $max_age; } It's fairly self-explanatory, really. It takes an encoded key, decodes it, and verifies that it's age lies between the specified bounds. Here's the code in full (it updates every time I update the code in the GitHub Gist): (Above: The full comment key code. Can't see it? Check it out on GitHub Gist here.) ## OC ReMix Albums Atom Feed I've recently discovered the wonderful OverClocked ReMix thanks to the help of a friend (thank you for the suggestion ☺), and they have a bunch of cool albums you should totally listen to (my favourite so far is Esther's Dreams :D)! To help keep up-to-date with the albums they release, I went looking for an RSS/Atom feed that I could subscribe to. Unfortunately, I didn't end up finding one - so I wrote a simple scraper to do the job for me, and I thought I'd share it on here. You can find it live over here. The script itself uses my earlier atom.gen.php script, since I had it lying around. Here it is in full: <?php$settings = new stdClass();
$settings->version = "0.1-alpha";$settings->ocremix_url = "https://ocremix.org";
$settings->album_url = "https://ocremix.org/albums/";$settings->user_agent = "OCReMixAlbumsAtomFeedGenerator/$settings->version (https://starbeamrainbowlabs.com/labs/ocremix_atom/albums_atom.php; <webmaster@starbeamrainbowlabs.com> ) PHP/" . phpversion() . "+DOMDocument";$settings->ignore_parse_errors = true;
$settings->atom_gen_location = "./atom.gen.php"; // -------------------------------------------- require($settings->atom_gen_location);

ini_set("user_agent", $settings->user_agent);$ocremix_albums_html = file_get_contents($settings->album_url);$ocremix_dom = new DOMDocument();
libxml_use_internal_errors($settings->ignore_parse_errors);$ocremix_dom->loadHTML($ocremix_albums_html);$ocremix_xpath = new DOMXPath($ocremix_dom);$ocremix_albums = $ocremix_xpath->query(".//table[contains(concat(' ', normalize-space(./@class), ' '), ' data ')]//tr[contains(concat(' ', normalize-space(./@class), ' '), ' area-link ')]");$ocremix_feed = new atomfeed();
$ocremix_feed->title = "Albums | OC ReMix";$ocremix_feed->id_uri = $settings->album_url;$ocremix_feed->sq_icon_uri = "http://ocremix.org/favicon.ico";
$ocremix_feed->addauthor("OverClocked ReMix", false, "https://ocremix.org/");$ocremix_feed->addcategories([ "music", "remixes", "albums" ]);

foreach ($ocremix_albums as$album) {
$album_date =$album->childNodes[6]->textContent;
$album_name =$album->childNodes[2]->childNodes[1]->textContent;
$album_url =$album->childNodes[2]->childNodes[1]->attributes["href"]->textContent;
$album_img_url =$album->childNodes[0]->childNodes[0]->attributes["src"]->textContent;

// Make the urls absolute
$album_url =$settings->ocremix_url .  $album_url;$album_img_url = $settings->ocremix_url .$album_img_url;

$content = "<p><a href='$album_url'><img src='$album_img_url' alt='$album_name' /> $album_name</a> - released$album_date";

$ocremix_feed->addentry($album_url,
$album_name, strtotime($album_date),
"OverClocked ReMix",
$content, false, [], strtotime($album_date)
);
}

echo(\$ocremix_feed->render());

Basically, I define a bunch of settings at the top for convenience (since I might end up changing them later). Then, I pull down the OcReMix albums page, extract the table of recent albums with a (not-so) tidy XPath query - since PHP's DOMDocument class doesn't seem to support anything else at a glance (yep, I used css-to-path to convert a CSS selector, since my XPath is a bit rusty and I was in a hurry :P).

Finally, I pull it apart to get a list of albums and their release dates - all of which goes straight into atom.gen.php and then straight out to the browser.

## Profiling PHP with XDebug

(This post is a fork of a draft version of a tutorial / guide originally written as an internal document whilst at my internship.)

Since I've been looking into xdebug's profiling function recently, I've just been tasked with writing up a guide on how to set it up and use it, from start to finish - and I thought I'd share it here.

While I've written about xdebug before in my An easier way to debug PHP post, I didn't end up covering the profiling function - I had difficulty getting it to work properly. I've managed to get it working now - this post documents how I did it. While this is written for a standard Debian server, the instructions can easily be applied to other servers.

For the uninitiated, xdebug is an extension to PHP that aids in the debugging of PHP code. It consists of 2 parts: The php extension on the server, and a client built into your editor. With these 2 parts, you can create breakpoints, step through code and more - though these functions are not the focus of this post.

To start off, you need to install xdebug. SSH into your web server with a sudo-capable account (or just use root, though that's bad practice!), and run the following command:

sudo apt install php-debug


Windows users will need to download it from here and put it in their PHP extension direction. Users of other linux distributions and windows may need to enable xdebug in their php.ini file manually (windows users will need extension=xdebug.dll; linux systems use extension=xdebug.so instead).

Once done, xdebug should be loaded and working correctly. You can verify this by looking the php information page. To see this page, put the following in a php file and request it in your browser:

<?php
phpinfo();
?>

If it's been enabled correctly, you should see something like this somewhere on the resulting page:

With xdebug setup, we can now begin configuring it. Xdebug gets configured in php.ini, PHP's main configuration file. Under Virtualmin each user has their own php.ini because PHP is loaded via CGI, and it's usually located at ~/etc/php.ini. To find it on your system, check the php information page as described above - there should be a row with the name "Loaded Configuration File":

Once you've located your php.ini file, open it in your favourite editor (or type sensible-editor php.ini if you want to edit over SSH), and put something like this at the bottom:

[xdebug]
xdebug.remote_enable=1
xdebug.remote_connect_back=1
xdebug.remote_port=9000
xdebug.remote_handler=dbgp
xdebug.remote_mode=req
xdebug.remote_autostart=true

xdebug.profiler_enable=false
xdebug.profiler_enable_trigger=true
xdebug.profiler_enable_trigger_value=ZaoEtlWj50cWbBOCcbtlba04Fj
xdebug.profiler_output_dir=/tmp
xdebug.profiler_output_name=php.profile.%p-%u

Obviously, you'll want to customise the above. The xdebug.profiler_enable_trigger_value directive defines a secret key we'll use later to turn profiling on. If nothing else, make sure you change this! Profiling slows everything down a lot, and could easily bring your whole server down if this secret key falls into the wrong hands (that said, simply having xdebug loaded in the first place slows things down too, even if you're not using it - so you may want to set up a separate server for development work that has xdebug installed if you haven't already). If you're not sure on what to set it to, here's a bit of bash I used to generate my random password:

dd if=/dev/urandom bs=8 count=4 status=none | base64 |  tr -d '=' | tr '+/' '-_'


The xdebug.profiler_output_dir lets you change the folder that xdebug saves the profiling output files to - make sure that the folder you specify here is writable by the user that PHP is executing as. If you've got a lot of profiling to do, you may want to consider changing the output filename, since xdebug uses a rather unhelpful filename by default. The property you want to change here is xdebug.profiler_output_name - and it supports a number of special % substitutions, which are documented here. I can recommend something phpprofile.%t-%u.%p-%H.%R.%cachegrind - it includes a timestamp and the request uri for identification purposes, while still sorting chronologically. Remember that xdebug will overwrite the output file if you don't include something that differentiates it from request to request!

With the configuration done, we can now move on to actually profiling something :D This is actually quite simple. Simply add the XDEBUG_PROFILE GET (or POST!) parameter to the url that you want to test in your browser. Here are some examples:

https://localhost/carrots/moon-iter.php?XDEBUG_PROFILE=ZaoEtlWj50cWbBOCcbtlba04Fj
https://development.galacticaubergine.de/register?vegetable=yes&mode=plus&XDEBUG_PROFILE=ZaoEtlWj50cWbBOCcbtlba04Fj

Adding this parameter to a request will cause xdebug to profile that request, and spit out a cachegrind file according to the settings we configured above. This file can then be analysed in your favourite editor, or, if it doesn't have support, an external program like qcachegrind (Windows) or kcachegrind (Everyone else).

If you need to profile just a single AJAX request or similar, most browsers' developer tools let you copy a request as a curl or wget command (Chromium-based browsers, Firefox - has an 'edit and resend' option), allowing you to resend the request with the XDEBUG_PROFILE GET parameter.

If you need to profile everything - including all subrequests (only those that pass through PHP, of course) - then you can set the XDEBUG_PROFILE parameter as a cookie instead, and it will cause profiling to be enabled for everything on the domain you set it on. Here's a bookmarklet that set the cookie:

javascript:(function(){document.cookie='XDEBUG_PROFILE='+'insert_secret_key_here'+';expires=Mon, 05 Jul 2100 00:00:00 GMT;path=/;';})();

(Source)

Replace insert_secret_key_here with the secret key you created for the xdebug.profiler_enable_trigger_value property in your php.ini file above, create a new bookmark in your browser, paste it in (making sure that your browser doesn't auto-remove the javascript: at the beginning), and then click on it when you want to enable profiling.

## How to update your linux kernel version on a KimSufi server

(Or why PHP throws random errors in the latest update)

Hello again!

Since I had a bit of a time trying to find some clear information on the subject, I'm writing the blog post so that it might help others. Basically, yesterday night I updated the packages on my server (the one that runs this website!). There was a PHP update, but I didn't think much of it.

This morning, I tried to access my ownCloud instance, only to discover that it was throwing random errors and refusing to load. I'm running PHP version 7.0.16-1+deb.sury.org~xenial+2. It was spewing errors like this one:

PHP message: PHP Fatal error:  Uncaught Exception: Could not gather sufficient random data in /srv/owncloud/lib/private/Security/SecureRandom.php:80
Stack trace:
#0 /srv/owncloud/lib/private/Security/SecureRandom.php(80): random_int(0, 63)
#1 /srv/owncloud/lib/private/AppFramework/Http/Request.php(484): OC\Security\SecureRandom->generate(20)
#2 /srv/owncloud/lib/private/Log/Owncloud.php(90): OC\AppFramework\Http\Request->getId()
#3 [internal function]: OC\Log\Owncloud::write('PHP', 'Uncaught Except...', 3)
#4 /srv/owncloud/lib/private/Log.php(298): call_user_func(Array, 'PHP', 'Uncaught Except...', 3)
#5 /srv/owncloud/lib/private/Log.php(156): OC\Log->log(3, 'Uncaught Except...', Array)
#6 /srv/owncloud/lib/private/Log/ErrorHandler.php(67): OC\Log->critical('Uncaught Except...', Array)
#7 [internal function]: OC\Log\ErrorHandler::onShutdown()
#8 {main}
thrown in /srv/owncloud/lib/private/Security/SecureRandom.php on line 80" while reading response header from upstream, client: x.y.z.w, server: ownc


That's odd. After googling around a bit, I found this page on the Arch Linux bug tracker. I'm not using arch (Ubuntu 16.04.2 LTS actually), but it turned out that this comment shed some much-needed light on the problem.

Basically, PHP have changed the way they ask the Linux Kernel for random bytes. They now use the getrandom() kernel function instead of /dev/urandom as they did before. The trouble is that getrandom() was introduced in linux 3.17, and I was running OVH's custom 3.14.32-xxxx-grs-ipv6-64 kernel.

Thankfully, after a bit more digging, I found this article. It suggests installing the kernel you want and moving one of the grub config file generators to another directory, but I found that simply changing the permissions did the trick.

Basically, I did the following:


apt update
apt install linux-image-generic
chmod -x /etc/grub.d/06_OVHkernel
update-grub
reboot


Basically, the above first updates everything on the system. Then it installs the linux-image-generic package. linux-image-generic is the pseudo-package that always depends on the latest stable kernel image available.

Next, I remove execute privileges on the file /etc/grub.d/06_OVHkernel. This is the file that gives the default installed OVH kernel priority over any other instalaled kernels, so it's important to exclude it from the grub configuration process.

Lastly, I update my grub configuration with update-grub and then reboot. You need to make sure that you update your grub configuration file, since if you don't it'll still use the old OVH kernel!

With that all done, I'm now running 4.4.0-62-generic according to uname -a. If follow these steps yourself, make sure you have a backup! While I am happy to try and help you out in the comments below, I'm not responsible for any consequences that may arise as a result of following this guide :-)

Art by Mythdael