Probably the world's most advanced string splitting function

When I was setting up this website, I foolishly picked a custom log file format that is rather hard for computers to parse. Because of this I haven't found a server log analysis tool that is intelligent enough to parse my logs (if you know of one please let me know in the comments!).

Here is an example of a typical log file entry:

[19/May/2015:00:47:52 +0100] "" HTTP/1.1 GET 200 0s :443 /blog/article.php article=posts/013-Terminal-Reference.html "" "Mozilla/5.0 (compatible; spbot/4.4.2; + )"

It looks strange, doesn't it? Since I want to have some idea of how many people are visiting my site, I have finally gotten around to writing my own custom log parser. In order to do this, I needed a way to convert each line into an array of terms. None of the answers on stackoverflow seemed to cut it, so I wrote my own:

function explode_adv($openers, $closers, $togglers, $delimiters, $str)
    $chars = str_split($str);
    $parts = [];
    $nextpart = "";
    $toggle_states = array_fill_keys($togglers, false); // true = now inside, false = now outside
    $depth = 0;
    foreach($chars as $char)
        if(in_array($char, $openers))
        elseif(in_array($char, $closers))
        elseif(in_array($char, $togglers))
                $depth--; // we are inside a toggle block, leave it and decrease the depth
                // we are outside a toggle block, enter it and increase the depth

            // invert the toggle block state
            $toggle_states[$char] = !$toggle_states[$char];
            $nextpart .= $char;

        if($depth < 0) $depth = 0;

        if(in_array($char, $delimiters) &&
           $depth == 0 &&
           !in_array($char, $closers))
            $parts[] = substr($nextpart, 0, -1);
            $nextpart = "";
    if(strlen($nextpart) > 0)
        $parts[] = $nextpart;

    return $parts;

I have also posted this on stackoverflow. This function of mine takes 5 parameters:

  1. An array of characters that open a block - e.g. [, (, etc.
  2. An array of characters that close a block - e.g. ], ), etc.
  3. An array of characters that toggle a block - e.g. ", ', etc.
  4. An array of characters that should cause a split into the next part.
  5. The string to work on.

This function probably will have flaws, but it works well enough for me.

You can also find this function on GitHub's Gist - as always suggestions and contributions are always welcome :)

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression containerisation css dailyprogrammer data analysis debugging demystification distributed computing documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions future game github github gist gitlab graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs learning library linux lora low level lua maintenance manjaro network networking nibriboard node.js operating systems own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems projects prolog protocol protocols pseudo 3d python reddit redis reference releases rendering resource review rust searching secrets security series list server software sorting source code control statistics storage svg talks technical terminal textures thoughts three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt


Art by Mythdael