February 3, 2016
14:03 -0500
Hubert Chathi: The fastest query is the one you don't do. Sped up the response time of some @sbscalculus.com pages from about 3s to about 0.3s by not fetching some irrelevant data. #
0 Comments
January 27, 2016

Automating browser-side unit tests with nodeunit and PhantomJS

11:24 -0500

I love unit tests, but they're only useful if they get run. For one of my projects at work, I have a set of server-side unit tests, and a set of browser-side unit tests. The server-side unit tests get run automatically on “git push`‘ via Buildbot, but the browser-side tests haven't been run for a long time because they don't work in Firefox, which is my primary browser, due to differences in the way it iterates through object keys.

Of course, automation would help, in the same way that automating the server-side tests ensured that they were run regularly. Enter PhantomJS, which is a scriptable headless WebKit environment. Unfortunately, even though PhantomJS can support many different testing frameworks, there is no existing support for nodeunit, which is the testing framework that I'm using in this particular project. Fortunately, it isn't hard to script support for nodeunit.

nodeunit's built-in browser support just dynamicall builds a web page with the test results and a test summary. If we just ran it as-is in PhantomJS, it would happily run the tests for us, but we wouldn't be able to see the results, and it would just sit there doing nothing when it was done. What we want is for the test results to be output to the console, and to exit when the tests are done (and exit with an error code if tests failed). To do this, we will create a custom nodeunit reporter that will communicate with PhantomJS.

First, let's deal with the PhantomJS side. Our custom nodeunit reporter will use console.log to print the test results, so we will pass through console messages in PhantomJS.

page.onConsoleMessage = function (msg) {
    console.log(msg);
};

We will use PhantomJS's callback functionality to signal the end of the tests. The callback data will just be an object containing the total number of assertions, the number of failed assertions, and the time taken.

page.onCallback = function (data) {
    if (data.failures)
    {
        console.log("FAILURES: " + data.failures + "/" + data.length + " assertions failed (" + data.duration + "ms)")
    }
    else
    {
        console.log("OK: " + data.length + " assertions (" + data.duration + "ms)");
    }
    phantom.exit(data.failures);
};

(Warning: the callback API is marked as experimental, so may be subject to change.)

If the test page fails to load for whatever reason, PhantomJS will just sit there doing nothing, which is not desirable behaviour, so we will exit with an error if something fails.

phantom.onError = function (msg, trace) {
    console.log("ERROR:", msg);
    for (var i = 0; i < trace.length; i++)
    {
        var t = trace[i];
        console.log(i, (t.file || t.sourceURL) + ': ' + t.line + t.function ? t.function : "");
    }
    phantom.exit(1);
};
page.onError = function (msg, trace) {
    console.log("ERROR:", msg);
    for (var i = 0; i < trace.length; i++)
    {
        var t = trace[i];
        console.log(i, (t.file || t.sourceURL) + ': ' + t.line + t.function ? t.function : "");
    }
    phantom.exit(1);
};
page.onLoadFinished = function (status) {
    if (status !== "success")
    {
        console.log("ERROR: page failed to load");
        phantom.exit(1);
    }
};
page.onResourceError = function (resourceError) {
    console.log("ERROR: failed to load " + resourceError.url + ": " + resourceError.errorString + " (" + resourceError.errorCode + ")");
    phantom.exit(1);
};

Now for the nodeunit side. The normal test page looks like this:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>ML Editor Test Suite</title>
    <link rel="stylesheet" href="stylesheets/nodeunit.css" type="text/css" />
    <script src="javascripts/module-requirejs.js" type="text/javascript"></script>
    <script src="javascripts/requirejs-config.js" type="text/javascript"></script>
    <script data-main="test" src="javascripts/require.js" type="text/javascript"></script>
  </head>
  <body>
    <h1 id="nodeunit-header">ML Editor Test Suite</h1>
  </body>
</html>

If you're not familiar with RequireJS pages, the <script data-main="test" src="javascripts/require.js" type="text/javascript"></script> line means that the main JavaScript file is called "test.js". We want to use the same script file for both a normal browser test and the PhantomJS-based test, so in PhantomJS, we will set window.nodeunit_reporter to our custom reporter. In "test.js", then, we will check for window.nodeunit_reporter, and if it is present, we will replace nodeunit's default reporter. Although there's no documented way of changing the reporter in the browser version of nodeunit, looking at the code, it's pretty easy to do.

if (window.nodeunit_reporter) {
    nodeunit.reporter = nodeunit_reporter;
    nodeunit.run = nodeunit_reporter.run;
}

(Disclaimer: since this uses an undocumented interface, it may break some time in the future.)

So what does a nodeunit reporter look like? It's just an object with two items: info (which is just a textual description) and run. run is a function that calls the nodeunit runner with a set of callbacks. I based the reporter off of a combination of nodeunit's default console reporter and its browser reporter.

window.nodeunit_reporter = {
    info: "PhantomJS-based test reporter",
    run: function (modules, options) {
        var opts = {
            moduleStart: function (name) {
                console.log("\n" + name);
            },
            testDone: function (name, assertions) {
                if (!assertions.failures())
                {
                    console.log("✔ " + name);
                }
                else
                {
                    console.log("✖ " + name);
                    assertions.forEach(function (a) {
                        if (a.failed()) {
                            console.log(a.message || a.method || "no message");
                            console.log(a.error.stack || a.error);
                        }
                    });
                }
            },
            done: function (assertions) {
                window.callPhantom({failures: assertions.failures(), duration: assertions.duration, length: assertions.length});
            }
        };
        nodeunit.runModules(modules, opts);
    }
};

Now in PhantomJS, I just need to get it to load a modified test page that sets window.nodeunit_reporter before loading "test.js", and voilà, I have browser tests running on the console. All that I need to do now is to add it to my Buildbot configuration, and then I will be alerted whenever I break a browser test.

The script may or may not work in SlimerJS, allowing the tests to be run in a Gecko-based rendering engine, but I have not tried it since, as I said before, my tests don't work in Firefox. One main difference, though, is that SlimerJS doesn't honour the exit code, so Buildbot would need to parse the output to determine whether the tests passed or failed.

0 Comments
January 18, 2016

When native code is slower than interpreted code

16:56 -0500

At work, I'm working on a document editor, and it needs to be able to read in HTML data. Well, that's simple, right? We're in a browser, which obviously is able to parse HTML, so just offload the HTML parsing to the browser, and then traverse the DOM tree that it creates.

var container = document.createElement("div");
container.innerHTML = html;

The browser's parser is native code, built to be robust, well tested. What could go wrong?

Unfortunately, going this route, it ended up taking about 70 seconds to parse a not-very-big document on my 4 year old laptop. 70 seconds. Not good.

Switching to a JavaScript-based HTML parser saw the parsing time drop down to about 9 seconds. Further code optimizations in other places brought it down to about 3 seconds. Not too bad.

So why is the JavaScript parser faster than the browser's native parser? Without digging into what the browser is actually doing, my best guess is that the browser isn't just parsing the HTML, but is also calculating styles, layouts, etc. This guess seems to be supported by the fact that not all HTML is parsed slowly; some other HTML of similar size is parsed very quickly (faster than using the JavaScript-based parser). But it can't be the whole story, because the browser is able to display that same HTML fairly quickly.

I may have to do some further investigations, but I guess the moral of the story is to not assume that offloading work is the fastest solution.

0 Comments
December 30, 2015
22:08 -0500
Hubert Chathi: RIP Ian Murdock, # founder
0 Comments
December 29, 2015
18:05 -0500
Hubert Chathi: Springer is providing free downloads of books older than 10 years, which includes my conference paper. http://www.springerlink.com/content/pbxglecx113c6axl/?p=4c7461a8f89141dda0a4664bc0d413fc&pi=1
0 Comments
December 22, 2015
19:13 -0500
Hubert Chathi: Congratulations to @spacex.com. I never thought that there could be so much excitement over seeing something stand still.
0 Comments
December 18, 2015
11:35 -0500
Hubert Chathi: Out of context comment of the day: "do nothing, since we don't care about users"
0 Comments
November 19, 2015

My development-to-production workflow so far

12:04 -0500

As a follow-up to my previous post, I've fleshed out my CI/(almost-)CD workflow a bit more. I write "(almost-)CD", because I've decided that I don't really want deployment to production to be completely automatic; I want to be able to manually mark a build as ready for production, at least for now.

When we last left off, I had Buildbot watching our git repository, and when it detected a change, it ran unit tests, and if the tests passed, updated the code on our VPS and triggered a reload. Since then, I've added email notifications (for all failures, and for success on the final step), and we've switched over to a Docker-based deployment. Here's what the process looks like now:

Buildbot still watches our git repository. When it detects a change, it checks out the code, and runs a docker build on a remote Docker instance to build a new image of the application; the buildbot slave is still running on our VPS, which does not support Docker, so we need to run Docker on a separate host. I could have also created a new buildbot slave on a Docker-capable host, but that seems like it would have been more work.

The new image is tagged with the git commit hash, as well as the "staging" tag. Next Buildbot runs the unit tests on the image itself. If the tests all pass, it pushes the image to our Docker repository on Tutum with the "staging" tag. Tutum watches for changes in the Docker repository, and when a new "staging" image is pushed, it redeploys our staging server. In the meantime, Buildbot sends me an email telling me that the build has passed.

Up to this point, everything since the git push has been automatic. After I get the email from Buildbot, I do a quick sanity check on our staging server, just in case the unit tests missed anything, and if all goes well, I re-push the Docker image, but this time with the "latest" tag. Again, Tutum will notice a new "latest" image, and this time redeploys our production server.

There are a couple things I really like about this setup. First of all, the number of manual steps involved is minimal; the only thing I do after pushing the code is checking the stage site, and re-pushing the image. Everything else is done automatically, which means that there's less chance of me forgetting to do something. Secondly, by using Docker images, I'm sure that the staging and production environments are exactly the same (or at least as close as possible).

One downside is that redeploying an image means there's a slight amount of downtime. This can be solved by using Blue/Green deployment instead of production/staging, but it's a bit more complicated to set up. This will probably be the next thing for me to look into.

0 Comments
11:28 -0500
Hubert Chathi: Happy World Toilet Day!
0 Comments
November 12, 2015
16:50 -0500
Hubert Chathi: Thanks to @cloudflare.com and their new Universal DNSSEC feature, my domain name is now secured using #
0 Comments
November 11, 2015
11:05 -0500
Hubert Chathi: My # server is now secured with a @letsencrypt.org certificate
0 Comments
November 5, 2015
10:28 -0500
Hubert Chathi: And in other news, # is now crawling.
0 Comments
08:06 -0500
Hubert Chathi: I think it's awesome that our new Minister of Transport is an astronaut. Also acceptable would have been Minister of Science.
0 Comments
October 19, 2015
09:00 -0400
Hubert Chathi: Canada, vote today.
0 Comments
September 24, 2015
09:27 -0400
Hubert Chathi: Happy birthday to you / Happy birthday to you / Happy birthday, dear xkcd / Happy birthday to you
0 Comments
September 23, 2015
10:32 -0400
Hubert Chathi: # yesterday to Amanda and I: "I wrote something! What does P-O-O-P spell?"
0 Comments
September 17, 2015

simple process respawning

16:29 -0400

File this under "I can't believe how long it took me to figure it out".

I'm Dockerizing some of our services at work. One of them (by design) kills itself after handling a number of requests. Of course, I want it to restart after it kills itself. Most solutions seem like overkill. Using a service supervisor like runit is great, but requires too much setup for monitoring just a single process. Forever is probably a good option, but I don't want to have to install Node.js in the image just to monitor it. Not to mention, Node.js has a non-trivial memory footprint.

Basically, I want something small and simple. No extra dependencies, minimal extra setup, minimal extra resource usage. After too much time looking for a solution, I came up with a 5-line shell script:

#!/bin/sh
while :
do
    "$@"
done

Name it forever.sh and put it in your PATH, and use it as: forever.sh <command> (e.g. forever.sh server -p 8080). It's just an infinite loop that executes its arguments until it gets killed.

0 Comments
September 15, 2015
11:39 -0400
Hubert Chathi: Just used Up Goer Five to show # how they landed on the moon
0 Comments
August 28, 2015
15:50 -0400
Hubert Chathi: congratulations to @phoenixframework.org on releasing 1.0. Looking forward to trying out out, someday... # #
0 Comments
August 26, 2015

Limiting concurrency in Node.js with promises

21:41 -0400

The nice thing about Node.js is its asynchronous execution model, which means that it can handle many requests very quickly. The flip side of this is that it can also generate many requests very quickly, which is fine if they can then be handled quickly, and not so good when they can't. For one application that I'm working on, some of the work gets offloaded to an external process; a new process is created for each request. (Unfortunately, that's the architecture that I'm stuck with for now.) And when doing a batch operation, Node.js will happily spawn hundreds of processes at once, without caring that doing so will cause everything on my workstation to slow to a crawl.

Limiting concurrency in Node.js has been written about elsewhere, but I'd like to share my promise-based version of this solution. In particular, this was built for the bluebird flavour of promises.

Suppose that we have a function f that performs some task, and returns a promise that is fulfilled when that task in completed. We want to ensure that we don't have too many instances of f running at the same time.

We need to keep track of how many instances are currently running, we need a queue of instances when we've reached our limit, and of course we need to define what our limit is.

var queue = [];
var numRunning = 0;
var max = 10

Our queue will just contain functions that, when called, will call f with the appropriate arguments, as well as perform the record keeping necessary for calling f. So to process the queue, we just check whether we are below our run limit, check whether the queue is non-empty, and run the function at the front of the queue.

function runnext()
{
    numRunning--;
    if (numRunning <= max && queue.length)
    {
        queue.shift()();
    }
}

Now we create a wrapper function f1 that will limit the concurrency of f. We will call f with the same arguments that f1 is called with. If we have already reached our limit, we queue the request; otherwile, we run f immediately. When we run f, whether it is immediately, or in the future, we must first increment our counter. After f is done, we process the next element in the queue. We must process the queue whether f succeeds or not, and we don't want to change the resolution of f's promise, so we tack a finally onto the promise returned by f.

function f1 ()
{
    var args = Array.prototype.slice.call(arguments);
    return new Promise(function (resolve, reject) {
        function run() {
            numRunning++;
            resolve(f.apply(undefined, args)
                    .finally(runnext));
        }
        if (numRunning > max)
        {
            queue.push(run);
        }
        else
        {
            run();
        }
    });
}

Of course, if you need to do this a lot, you may want to wrap this all up in a higher-order function. For example:

function limit(f, max)
{
    var queue = [];
    var numRunning = 0;

    function runnext()
    {
        numRunning--;
        if (numRunning <= max && queue.length)
        {
            queue.shift()();
        }
    }

    return function ()
    {
        var args = Array.prototype.slice.call(arguments);
        return new Promise(function (resolve, reject) {
            function run() {
                numRunning++;
                resolve(f.apply(undefined, args)
                        .finally(runnext));
            }
            if (numRunning > max)
            {
                queue.push(run);
            }
            else
            {
                run();
            }
        });
    };
}

This would be used as:

f = limit(f, 10);

which would replace f with a new function that is equivalent to f, except that only 10 instances will be running at a time.

0 Comments