Starbeamrainbowlabs

Stardust
Blog

Own your code, Part 2: The curious case of the unreliable webhook

In the last post, I talked about how to setup your own Git server with Gitea. In this one, I'm going to take bit of a different tack - and talk about one of the really annoying problems I ran into when setting up my continuous integration server, Laminar CI.

Since I wanted to run the continuous integration server on a different machine to the Gitea server itself, I needed a way for the Gitea server to talk to the CI server. The natural choice here is, of course, a Webhook-based system.

After installing and configuring Webhook on the CI server, I set to work writing a webhook receiver shell script (more on this in a future post!). Unfortunately, it turned out that that Gitea didn't like sending to my CI server very much:

A ton of failed attempts at sending a webhook to the CI server

Whether it succeeded or not was random. If I hit the "Test Delivery" button enough times, it would eventually go through. My first thought was to bring up the Gitea server logs to see if it would give any additional information. It claimed that there was an i/o timeout communicating with the CI server:

Delivery: Post https://ci.bobsrockets.com/hooks/laminar-config-check: read tcp 5.196.73.75:54504->x.y.z.w:443: i/o timeout

Interesting, but not particularly helpful. If that's the case, then I should be able to get the same error with curl on the Gitea server, right?

curl https://ci.bobsrockets.com/hooks/testhook

.....wrong. It worked flawlessly. Every time.

Not to be beaten by such an annoying issue, I moved on to my next suspicion. Since my CI server is unfortunately behind NAT, I checked the NAT rules on the router in front of it to ensure that it was being exposed correctly.

Unfortunately, I couldn't find anything wrong here either! By this point, it was starting to get really rather odd. As a sanity check, I decided to check the server logs on the CI server, since I'm running Webhook behind Nginx (as a reverse-proxy):

5.196.73.75 - - [04/Dec/2018:20:48:05 +0000] "POST /hooks/laminar-config-check HTTP/1.1" 408 0 "-" "GiteaServer"

Now that's weird. Nginx has recorded a HTTP 408 error. Looking is up reveals that it's a Request Timeout error, which has the following definition:

The server did not receive a complete request message within the time that it was prepared to wait.

Wait what? Sounds to me like there's an argument going on between the 2 servers here - in which each server is claiming that the other didn't send a complete request or response.

At this point, I blamed this on a faulty HTTP implementation in Gitea, and opened an issue.

As a workaround, I ended up configuring Laminar to use a Unix socket on disk (as opposed to an abstract socket), forwarding it over SSH, and using a git hook to interact with it instead (more on how I managed this in a future post. There's a ton of shell scripting that I need to talk about first).

This isn't the end of this tail though! A month or two after I opened the issue, I wound up in the situation whereby I wanted to connect a GitHub repository to my CI server. Since I don't have shell access on github.com, I had to use the webhook.

When I did though, I got a nasty shock: The webhook deliveries exhibited the exact same random failures as I saw with the Gitea webhook. If I'd verified the Webhook server and cleared Gitea's HTTP implementation's name, then what else could be causing the problem?

At this point, I can only begin to speculate what the issue is. Personally, I suspect that it's a bug in the port-forwarding logic of my router, whereby it drops the first packet from a new IP address while it sets up a new NAT session to forward the packets to the CI server or something - so subsequent requests will go through fine, so long as they are sent within the NAT session timeout and from the same IP. If you've got a better idea, please comment below!

Of course, I really wanted to get the GitHub repository connected to my CI server, and if the only way I could do this was with a webhook, it was time for some request-wrangling.

My solution: A PHP proxy script running on the same server as the Gitea server (since it has a PHP-enabled web server set up already). If said script eats the request and emits a 202 Accepted immediately, then it can continue trying to get a hold of the webhook on the CI server 'till the cows come home - and GitHub will never know! Genius.

PHP-FPM (the fastcgi process manager; great alongside Nginx) makes this possible with the fastcgi_finish_request() method, which both flushes the buffer and ends the request to the client, but doesn't kill the PHP script - allowing for further processing to take place without the client having to wait.

Extreme caution must be taken with this approach however, as it can easily lead to a situation where the all the PHP-FPM processes are busy waiting on replies from the CI server, leaving no room for other requests to be fulfilled and a big messy pile-up in the queue forming behind them.

Warnings aside, here's what I came up with:

<?php

$settings = [
    "target_url" => "https://ci.bobsrockets.com/hooks/laminar-git-repo",
    "response_message" => "Processing laminar job proxy request.",
    "retries" => 3,
    "attempt_timeout" => 2 // in seconds, for a single attempt
];

$headers = "host: ci.starbeamrainbowlabs.com\r\n";
foreach(getallheaders() as $key => $value) {
    if(strtolower($key) == "host") continue;
    $headers .= "$key: $value\r\n";
}
$headers .= "\r\n";

$request_content = file_get_contents("php://input");

// --------------------------------------------

http_response_code(202);
header("content-type: text/plain");
header("content-length: " . strlen($settings["response_message"]));
echo($settings["response_message"]);

fastcgi_finish_request();

// --------------------------------------------

function log_message($msg) {
    file_put_contents("ci-requests.log", $msg, FILE_APPEND);
}

for($i = 0; $i < $settings["retries"]; $i++) {
    $start = microtime(true);

    $context = stream_context_create([
        "http" => [
            "header" => $headers,
            "method" => "POST",
            "content" => $request_content,
            "timeout" => $settings["attempt_timeout"]
        ]
    ]);

    $result = file_get_contents($settings["target_url"], false, $context);

    if($result !== false) {
        log_message("[" . date("r") . "] Queued laminar job in " . (microtime(true) - $start_time)*1000 . "ms");
        break;
    }


    log_message("[" . date("r") . "] Failed to laminar job after " . (microtime(true) - $start_time)*1000 . "ms.");
}

I've named it autowrangler.php. A few things of note here:

This ends the tale of the recalcitrant and unreliable webhook. Hopefully you've found this an interesting read. In future posts, I want to look at how I configured Webhook, the inner workings of the git hook I mentioned above, and the collection of shell scripts I've cooked to that make my CI server tick in a way that makes it easy to add new projects quickly.

Found this interesting? Run into this issue yourself? Found a better solution workaround? Comment below!

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blog bookmarklet booting bug hunting c sharp c++ challenge chrome os code codepen coding conundrums coding conundrums evolved command line compilers compiling compression css dailyprogrammer data analysis debugging demystification distributed computing documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro network networking nibriboard node.js operating systems performance photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference releases resource review rust searching secrets security series list server software sorting source code control statistics storage svg talks technical terminal textures thoughts three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

Archive

Art by Mythdael