Website change detection with headless Firefox and ImageMagick
This wasn't the script I had in mind in the previous blog post (so you can look forward to another blog post about it), but have you ever wanted to know when a web page changes? If it does change, it's almost impossible to tell where on the page it's changed. Recently, I was thinking about the problem, and realised a few things:
- Firefox can be operated headlessly (with
--headless
) to take screenshots - ImageMagick must be advanced enough to diff images
With this in mind, I set about implementing a script. Before we continue, here's an example diff image:
It's rather tall because of the webpage I chose, but the bits that have changed appear in red. The script I've written also generates an animated PNG showing the difference too:
Again, it's very tall because of the page I tested with, but I think it's pretty cool!
If you'd like to check the script out for yourself, you find it in the following git repository: sbrl/url-diff
For the curious, the script in question is written in Bash. It uses apcalc (available in Debian / Ubuntu based Linux distributions with sudo apt install apcalc
) to crunch the numbers, and headless Firefox + Imagemagick as described above to take the screenshots and do the image processing. It should in theory work on Windows, but you'll need to jump through a number of hoops:
- Install call
url-diff.sh
from [git bash]() - Install [ImageMagick]() and make sure the binaries are in your
PATH
- Install Firefox and make sure
firefox
is in yourPATH
- Explicitly set the
URLDIFF_STORAGE_DIR
environment variable when calling the script (do this by prefixing the command at the bottom of this post withURLDIFF_STORAGE_DIR=path/to/directory
)
With my fancy new embed system, I can show you the code behind it:
(Can't see the above? Check it out in the git repository.)
I'm working on line numbers (sadly the author of highlight.js doesn't like them, so an alternative solution is required).
Anyway, the basic layout of the script is as follows:
- First, the settings are read in and the default values set
- Then, I define some utility functions.
- The
calculate_percentage_colour
function is integral to the image change detection algorithm. It counts percentage of an image that is a given colour.
- The
- Next, the help text is displayed if necessary
- The
case
statement that follows allows multiple subcommands to be implemented. Currently I only have acheck
subcommand, but you never know! - Inside this case statement, the screenshots are taken and compared.
- A new screenshot is taken with headless Firefox
- If we don't have a screenshot stored away already, we stash the new screenshot and exit
- If we do have a pre-existing screenshot, we continue with the comparison, starting by generating a diff image where pixels that have changed are given 1 colour, and pixels that haven't changed another
- It's at this point that
calculate_percentage_colour
is called to calculate how much of the image has changed - the diff image is passed in and the changed pixels are counted - If more than 2% (by default) has changed, then we continue on to generate the output images
- The first output image consists of the new screenshot with the diff image overlaid - this is generated with some ImageMagick wizardry:
-compose over -composite
- The second is an animated PNG comprised of the old and new screenshots. This is generated with
ffmpeg
- which supports animated PNGs - Finally, the old screenshot that we have stored away is replaced with the new one
It sounds more complicated than it is - hopefully my above explanation makes sense (post a comment below if you're confused about something!).
You can call the script like so:
git clone https://git.starbeamrainbowlabs.com/sbrl/url-diff.git
cd url-diff;
./url-diff.sh check URL_HERE path/to/output_diff.png path/to/output.apng
....replacing URL_HERE
with the URL to check, and the paths with the places you'd like to write the output images to.