Finding Favicons with PHP

There hasn't been a post here for a little while because I have been ill. I am back now though :)

While writing more Bloworm, I needed a function that would automatically detect the url of the favicon that is associated with a given url. I wrote a quick function to do this a while ago - and have been improving it little by little.

I now have it at a point where it finds the correct url 99% of the time, so I thought that I would share it with you.

 * @summary Given a url, this function will attempt to find it's correspending favicon.
 * @returns The url of the corresponding favicon.
function auto_find_favicon_url($url)
        senderror(new api_error(400, 520, "The url you specified for the favicon was invalid."));

    // todo protect against downloading large files
    // todo send HEAD request instead of GET request
    try {
        $headers = get_headers($url, true);
    } catch (Exception $e) {
        senderror(new api_error(502, 710, "Failed to fetch the headers from url: $url"));
    $headers = array_change_key_case($headers);

    $urlparts = [];
    preg_match("/^([a-z]+)\:(?:\/\/)?([^\/?#]+)(.*)/i", $url, $urlparts);

    $content_type = $headers["content-type"];
    if(!is_string($content_type)) // account for arrays of content types
        $content_type = $content_type[0];

    $faviconurl = "images/favicon-default.png";
    if(strpos($content_type, "text/html") !== false)
        try {
            $html = file_get_contents($url);
        } catch (Exception $e) {
            senderror(new api_error(502, 711, "Failed to fetch url: $url"));
        $matches = [];
        if(preg_match("/rel=\"shortcut(?: icon)?\" (?:href=[\'\"]([^\'\"]+)[\'\"])/i", $html, $matches) === 1)
            $faviconurl = $matches[1];
            // make sure that the favicon url is absolute
            if(preg_match("/^[a-z]+\:(?:\/\/)?/i", $faviconurl) === 0)
                // the url is not absolute, make it absolute
                $basepath = dirname($urlparts[3]);

                // the path should not include the basepath if the favicon url begins with a slash
                if(substr($faviconurl, 0, 1) === "/")
                    $faviconurl = "$urlparts[1]://$urlparts[2]$faviconurl";
                    $faviconurl = "$urlparts[1]://$urlparts[2]$basepath/$faviconurl";

    if($faviconurl == "images/favicon-default.png")
        // we have not found the url of the favicon yet, parse the url
        // todo guard against invalid urls

        $faviconurl = "$urlparts[1]://$urlparts[2]/favicon.ico";
        $faviconurl = follow_redirects($faviconurl);
        $favheaders = get_headers($faviconurl, true);
        $favheaders = array_change_key_case($favheaders);

        if(preg_match("/2\d{3}/i", $favheaders[0]) === 0)
            return $faviconurl;

    return $faviconurl;

This code is pulled directly from the Bloworm source code - so you will need to edit it slightly to suit your needs. It is not perfect, and will probably will be updated from time to time.

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression containerisation css dailyprogrammer data analysis debugging demystification distributed computing documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro network networking nibriboard node.js operating systems own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems projects prolog protocol protocols pseudo 3d python reddit redis reference releases rendering resource review rust searching secrets security series list server software sorting source code control statistics storage svg talks technical terminal textures thoughts three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt


Art by Mythdael