Mailing List Articles Atom Feed Comments Atom Feed Twitter Reddit Facebook

Tag Cloud

3d account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio bash batch blog bookmarklet booting c sharp c++ challenge chrome os code codepen coding conundrums coding conundrums evolved command line compilers compiling compression css dailyprogrammer debugging demystification distributed computing documentation downtime electronics email embedded systems encryption es6 features event experiment external first impressions future game github github gist graphics hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet io.js jabber javascript js bin labs learning library linux low level lua maintenance manjaro network networking node.js operating systems performance photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit reference release releases resource review rust secrets security series list server software sorting source code control statistics storage svg technical terminal textures three thing game three.js tool tutorial tutorials twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 xmpp xslt

Quest Get: Search large amounts of code!

A map  of the Linux Kernel source code.

(Above: A map of the Linux Kernel source code. Source: this post on medium.)

Recently I was working on a little project of mine (nope, not this for once! :P), and I needed a C♯ class I'd written a while ago. Being forgetful as I am, I had no idea which of my project I'd written it for. And so the quest began to find it! I did in the end, but it left me thinking whether there was a better way to search all my code quickly. This post is the culmination of everything I've discovered so far about the process of searching one's code.

Before I started, I already know about grep, which is built into almost every Linux system around. It's even available for Windows via the MSYS Tools. Unfortunately though, despite it's prevailance, it's not particularly good at searching large numbers of git repositories, as it keeps descending into the .git folder and displaying a whole load of useless results.

Something had to change. After asking reddit, I was introduced to OpenGrok. Written in Java, it indexes all of your code, and provides a web interface through which you can search it. Very nice. Unfortunately, I had trouble figuring out the logistics of actually getting it to run - and discovered that it takes multiple hours to set up correctly.

Moving on, I was re-introduced to ack, written in plain-old Perl, it apparently runs practically any system that Perl does - though it's not installed by default like grep is. Looking into it, I found it to be much like grep - only smarter. It ignores version control directories (like the .git folder ), and common package folders (like node_modules) by default, and even has a system by which results can be filtered by language (with support for hash-bangs too!). The results themselves are coloured by default - making it easy to skim through quickly. Coupled with the flexible configuration file system, ack makes for a wonderfully flexible way to search through large amounts of code quickly.

Though ack looks good, I still didn't have a way to search through all my code that scattered across multiple devices at once, so I kept looking. The next project I found (through alternative to actually) was Text Sherlock. It positions itself as an alternative to OpenGrok that's much simpler to configure.

True to its word, I managed to get a test instance set up running from my /tmp directory in 15 minutes - though it did take a while to index the code I had locally. It also took several seconds to consult its index when I entered a query. I suspect I could alleviate both of these issues by installing Xapian (an open-source high-performance search library), which it appears to have support for.

While the interface was cool, it didn't appear to allow me to tell it which directories not to index, so it ended trawling through all my .git directories - just like grep did. It also doesn't appear to multi-threaded - so it took much longer to index my code than it really needed to (I've got a solid-state drive and enough RAM for a few GBs of cache, so the indexing operation was CPU-bound, not I/O-bound).

In the end, I've rediscovered the awesome search tool ack, and taken a look at the current state of code search tools today. While I haven't yet found precisely what I'm looking for, I'm further forward than when I started.

Other honourable mentions include GNU Global (which apparently needs several GiBs per ~300MiB of source code for its generated static HTML web interface), (an IDE-like freemium cloud product that 'understands your code'), CodeQuery (only supports C, C++, Java, Python, Ruby, Javascript, and Go), and ripgrep (rust-based program, similar to ack and grep, feature comparison). The official ack website has a good page that contains more tools that are worth a look, too.

Got a cool way to search through all your code? Did this help you out? Comment below!

Finding the distance to a (finite) line from a point in Javascript

A screenshot of the library I've written. Explanation below.

For a project of mine (which I might post about once it's more stable), I'm going to need a way to find the distance to a point from the mouse cursor to implement an eraser. I've attempted this problem before - but it didn't exactly go to plan. To that end, I decided to implement the algorithm on its own to start with - so that I could debug it properly without all the (numerous) moving parts of the project I'm writing it for getting in the way.

As you may have guessed since you're reading this post, it actually went rather well! Using the C++ implementation on this page as a reference, it didn't take more than an hour or two to get a reasonable implementation working - and it didn't take a huge amount of time to tidy it up into an npm package for everyone to use!

My implementation uses ES6 Modules - so you may need to enable them in about:config or chrome://flags if you haven't already (don't believe the pages online that say you need Firefox / Chrome nightly - it's available in stable, just disabled by default) before taking a look at the demo, which you can find here:

Line Distance Calculator

(Click and drag to draw a line - your distance from it is shown in the top left)

The code behind it is actually quite simple - just rather full of nasty maths that will give you a headache if you try and understand it all at once (I broke it down, which helped). The library exposes multiple methods to detect a point's distance from different kinds of line - one for multi-segmented lines (which I needed in the first place), one for a single (finite) line (which the multi-segmented line employs), and one for a single infinite line - which I implemented first, using this Wikipedia article - before finding that it was buggy because it was for an infinite line (even though the article's name is apparently correct)!

I've written up a usage guide if you're interested in playing around with it yourself.

I've also got another library that I've released recently (also for Nibriboard) that simplifies multi-segmented lines instead of finding the distance to them, which I may post about about soon too!

Update: Looks like I forgot that I've already posted about the other library! You can read about it here: Line Simplification: Visvalingam's Algorithm

Got a question? Wondering why I've gone to the trouble of implementing such an algorithm? Comment below - I'd love to hear your thoughts!

Art by Mythdael