Starbeamrainbowlabs

Stardust
Blog

Website change detection with headless Firefox and ImageMagick

This wasn't the script I had in mind in the previous blog post (so you can look forward to another blog post about it), but have you ever wanted to know when a web page changes? If it does change, it's almost impossible to tell where on the page it's changed. Recently, I was thinking about the problem, and realised a few things:

With this in mind, I set about implementing a script. Before we continue, here's an example diff image:

It's rather tall because of the webpage I chose, but the bits that have changed appear in red. The script I've written also generates an animated PNG showing the difference too:

Again, it's very tall because of the page I tested with, but I think it's pretty cool!

If you'd like to check the script out for yourself, you find it in the following git repository: sbrl/url-diff

For the curious, the script in question is written in Bash. It uses apcalc (available in Debian / Ubuntu based Linux distributions with sudo apt install apcalc) to crunch the numbers, and headless Firefox + Imagemagick as described above to take the screenshots and do the image processing. It should in theory work on Windows, but you'll need to jump through a number of hoops:

With my fancy new embed system, I can show you the code behind it:

(Can't see the above? Check it out in the git repository.)

I'm working on line numbers (sadly the author of highlight.js doesn't like them, so an alternative solution is required).

Anyway, the basic layout of the script is as follows:

  1. First, the settings are read in and the default values set
  2. Then, I define some utility functions.
    • The calculate_percentage_colour function is integral to the image change detection algorithm. It counts percentage of an image that is a given colour.
  3. Next, the help text is displayed if necessary
  4. The case statement that follows allows multiple subcommands to be implemented. Currently I only have a check subcommand, but you never know!
  5. Inside this case statement, the screenshots are taken and compared.
    • A new screenshot is taken with headless Firefox
    • If we don't have a screenshot stored away already, we stash the new screenshot and exit
    • If we do have a pre-existing screenshot, we continue with the comparison, starting by generating a diff image where pixels that have changed are given 1 colour, and pixels that haven't changed another
    • It's at this point that calculate_percentage_colour is called to calculate how much of the image has changed - the diff image is passed in and the changed pixels are counted
    • If more than 2% (by default) has changed, then we continue on to generate the output images
    • The first output image consists of the new screenshot with the diff image overlaid - this is generated with some ImageMagick wizardry: -compose over -composite
    • The second is an animated PNG comprised of the old and new screenshots. This is generated with ffmpeg - which supports animated PNGs
    • Finally, the old screenshot that we have stored away is replaced with the new one

It sounds more complicated than it is - hopefully my above explanation makes sense (post a comment below if you're confused about something!).

You can call the script like so:

git clone https://git.starbeamrainbowlabs.com/sbrl/url-diff.git
cd url-diff;
./url-diff.sh check URL_HERE path/to/output_diff.png path/to/output.apng

....replacing URL_HERE with the URL to check, and the paths with the places you'd like to write the output images to.

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression containerisation css dailyprogrammer data analysis debugging demystification distributed computing documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro network networking nibriboard node.js operating systems own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems projects prolog protocol protocols pseudo 3d python reddit redis reference releases rendering resource review rust searching secrets security series list server software sorting source code control statistics storage svg talks technical terminal textures thoughts three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

Archive

Art by Mythdael