June 22, 2016

Load balancing Kubernetes pods

10:10 -0400

At work, we recently switched from Tutum (now Docker Cloud) to Kubernetes. Part of that work was building up a load balancer. Kubernetes has built-in load balancing capabilities, but it only works with Google Compute or AWS, which we are not using. It also requires a public IP address for each service, which usually means extra (unnecessary) costs.

Having previously worked with Tutum's HAProxy image, I figured I could do the same thing with Kubernetes. A quick web search didn't turn up any existing project, so I quickly wrote my own. Basically, we have HAProxy handling all incoming HTTP(S) connections and passing them off to different services based on the Host header. There's a watcher that watches Kubernetes such as new/deleted pods for relevant changes and updates the HAProxy configuration so that it always sends requests to the right place. I also improved the setup by adding in a Varnish cache for some of our services. Here's how it all works.

We have two sets of pods: an set of HAProxy pods and a set of Varnish pods. Each pod has a Python process that watches etcd for Kubernetes changes, updates the appropriate (HAProxy or Varnish) configuration, and tells HAProxy/Varnish about the new configuration. Why do we watch etcd instead of using the Kubernetes API directly? Because, as far as I can tell, in the Kubernetes API, you can only watch one type of object (either pods, configmaps, secrets, etc.) for changes, whereas we need to watch multiple types at once, so dealing with the Kubernetes API means that we would need to make multiple simultaneous API requests, which would just make things more complicated.

Unlike Tutum's HAProxy image, which only allows you to change certain settings using environment variables, our entire configuration template is configurable using Jinja2 templates. This gives us a lot more flexibility, including being able to plug in Varnish fairly easily without having to make any code changes to the HAProxy configurator. Also, configuration variables for services are stored in their own ConfigMap, rather than as environment variables in the target pods which allows us to make configuration changes without restarting the pods.

When combining HAProxy and Varnish, one question to ask is how to arrange them: HAProxy in front of Varnish, or Varnish in front of HAProxy? We are using a setup similar to the one recommended in the HAProxy blog. In that setup, HAProxy handles all requests and passes non-cacheable requests directly to the backend servers. Cacheable requests are, of course, passed to Varnish. If Varnish has a cache miss, then it passes the request back to HAProxy, which then hands off the request to the backend server. As the article points out, in the event of a cache miss, there's a lot of requests, but cache misses should be very infrequent since Varnish only sees cacheable content. One main difference between the setup we have and the one in the article is that in the article, HAProxy listens on two IP addresses: one for requests coming from the public, and one for requests coming from Varnish. In our setup, we don't have two IP addresses for HAProxy to use. Instead, Varnish adds a request header that indicates that the request is coming from it, and HAProxy checks for that header.

At first, I set the Python process as the pod's command (the pod's PID 1), but ran into a slight issue. HAProxy reloads its configuration by, well, not reloading its configuration; it starts a new set of processes with the new configuration, which means that we ended up with a lot of zombie processes. To fix this, I could have changed the Python process to reap the zombies, but it was easier to just use Yelp's dumb-init instead.

We have the HAProxy pods managed as a DaemonSet, so it's running on every node, and the pods are set to use host networking for better performance. HAProxy itself is small enough that, at least with our current traffic, it doesn't affect the nodes much, so it isn't a problem for us right now to run it on every node. If we get enough traffic that it does make a difference, we can dedicate a node to it without much problem. One thing about this setup is that, even though it uses Kubernetes, HAProxy and Varnish don't need to be managed by Kubernetes. It just needs to be able to talk to etcd. So if we ever need a dedicated load balancer, we can spin up a node (or nodes) that just runs HAProxy and/or Varnish, say, using a DaemonSet and nodeSelector. Varnish is managed as a normal Kubernetes deployment and uses the normal container networking, so there's a bit of overhead there, but is fine for now. Again, if we have more concerns about performance, we can change our configuration easily enough.

It all seems to be working fairly well so far. There are some configuration tweaks that I'll have to go make, and there's one strange issue where Varnish doesn't like one of our services and just returns an empty response. But other than that, Varnish and HAProxy are just doing what they're supposed to do.

All the code is available on GitHub (HAProxy, Varnish).

0 Comments
June 21, 2016
16:03 -0400
Hubert Chathi: At 14 months, # had 14 teeth. He's now 15 months and has 15 teeth, and is on track to have 16 teeth by the time he's 16 months. If current trends continue, by the time he's 4 years old, he'll have 48 teeth. #
0 Comments
June 9, 2016

Kubernetes vs Docker Cloud

09:42 -0400

Note: this is not a comprehensive comparison of Kubernetes and Docker Cloud. It is just based on my own experiences. I am also using Tutum and Docker Cloud more or less interchangeably, since Tutum became Docker Cloud.

At work, we used to use Tutum for orchestrating our Docker containers for our Calculus practice problems site. While it was in beta, Tutum was free, but Tutum has now become Docker cloud and costs about $15 per month per managed node per month, on top of server costs. Although we got three free nodes since we were Tutum beta testers, we still felt the pricing was a bit steep, since the management costs would be more than the hosting costs. Even more so since we would have needed more private Docker repositories than what would have been included.

So I started looking for self-hosted alternatives. The one I settle on was Kubernetes, which originated from Google. Obviously, if you go self-hosted, you need to have enough system administration knowledge to do it, whereas with Docker Cloud, you don't need to know anything about system administration. It's also a bit more time consuming to set up — it took me about a week to set up Kubernetes (though most of that time was scripting the process so that we could do it again more quickly next time), whereas with Tutum, it took less than a day to get up and running.

Kubernetes will require at least one server for itself — if you want to ensure high availability, you'll want to run multiple masters. We're running on top of CoreOS, and a 512MB node seems a bit tight for the master for our setup. A 1GB node was big enough that, although they recommend not to, I allowed the master to schedule running other pods.

Kubernetes seems to have a large-ish overhead on the worker nodes (a.k.a. minions). Running top, the system processes take up at least 200MB, which means that on a 512MB node, you'd only have about 300MB to run your own pods unless you have swap space. I have no idea what the overhead on a Tutum/Docker cloud node was, since I didn't have access to check.

Previously, under Tutum, we were running on 5*512MB nodes, each of which had 512MB swap space. Currently, we're running on 3*1GB worker nodes plus 1*1GB master node (which also serves as a worker), no swap. (We'll probably need to add another worker in the near future (or maybe another combined master/worker) though under Tutum, we would have probably needed another node with the changes that I'm planning anyways.) Since we also moved from DigitalOcean to DreamHost (affiliate link) and their new DreamCompute service (which just came out of Beta as we were looking into self-hosting), our new setup ended up costing $1 less per month.

Under Tutum, the only way to pass in configuration (other than baking it into your Docker image, or unless you run your own configuration server) is through environment variables. With Kubernetes, you have more options, such as ConfigMaps and Secrets. That gives you more flexibility and allows (depending on your setup) on changing configuration on-the-fly. For example, I created an auto-updating HAProxy configuration that allows you to specify a configuration template via a ConfigMap. When you update the ConfigMap, HAProxy gets immediately reconfigured with almost no downtime. This is in contrast to the Tutum equivalent, in which a configuration change (via environment variables) would require a restart and hence more downtime.

The other configuration methods also allows the configuration to be more decoupled. For example, with Tutum's HAProxy, the configuration for a service such as virtual host names are specified using the target container's environment variables, which means that if you want to change the set of virtual hosts or the SSL certificate, you would need to restart your application containers. Since our application server takes a little while to restart, we want to avoid having to do that. On the other hand, if the configuration were set in HAProxy's environment, then it would be lost to other services that might want to use it (such as monitoring software that would might use the HTTP_CHECK variable). With a ConfigMap, however, the configuration does not need to belong to one side or the other; it can stand on its own, and so it doesn't interfere with the application container, and can be accessed by other pods.

Kubernetes can be all configured using YAML (or JSON) files, which means that everything can be version controlled. Under Tutum, things are primarily configured via the web interface, though they do have a command-line tool that you could use as well. However, the command-line tool uses a different syntax for creating versus updating, whereas with Kubernetes, you can just "kubectl apply -f", so even if you use the Tutum CLI and keep a script under version control for creating your services, it's easier to forget to change your script after you've changed a service.

There are a few things that Tutum does that Kubernetes doesn't do. For example, Tutum has built-in node management (if you use AWS, DigitalOcean, or one of the other providers that it is made to work with), whereas with Kubernetes, you're responsible for setting up your own nodes. Though there are apparently tools built on top of Kubernetes that do similar things, I never really looked into them, since we currently don't need to bring up/take down nodes very frequently. Tutum also has more deployment strategies (such as "emptiest node" and "high availability"), which was not that important for us, but might be more important for others.

Based on my experience so far, Kubernetes seems to be a better fit for us. For people who are unable/unwilling to administer their own servers, Docker Cloud would definitely be the better choice, and starting with Tutum definitely gave me time to look around in the Docker ecosystem before diving into a self-hosted solution.

0 Comments
June 8, 2016
09:32 -0400
Hubert Chathi: We went for an evening walk yesterday, and by the end of it, both # and # learned how to ride a bike.
0 Comments
May 26, 2016
13:49 -0400
Hubert Chathi: Looking forward to @mec.ca opening in Kitchener
0 Comments
May 25, 2016
14:53 -0400
Hubert Chathi: migrated our @sbscalculus.com servers from @tutum.co to @kubernetes.io. #
0 Comments
May 24, 2016
11:55 -0400
Hubert Chathi: It's interesting that @kubernetes.io is a # project, but they do not use Google+.
0 Comments
May 13, 2016
09:19 -0400
Hubert Chathi: My whalebuilder package building tool is now in # #
0 Comments
May 6, 2016
09:51 -0400
Hubert Chathi: Wow. The Fort McMurray wildfire compared in area to various cities.
Via: reddit
0 Comments
April 29, 2016

Let's encrypt errata

10:06 -0400

Back in February, I posted about Automatic Let's Encrypt certificates on nginx. One of the scripts had a problem in that it downloaded the Let's Encrypt X1 intermediate certificate. Let's Encrypt recently switched to using their X3 intermidiate, which means that Firefox was unable to reach sites using the generated certificates, and Chrome/IE/Safari needed to make an extra download to verify the certificate.

Of course, instead of just changing the script to download the X3 certificate, it's best to automatically download the right certificate. So I whipped up a quick Python script, cert-chain-resolver-py (inspired by the Go version) that checks a certificate and downloads the other certificates in the chain.

I've updated my original blog post. The changed script is /usr/local/sbin/letsencrypt-renew, and of course you'll need to install cert-chain-resolver-py (the script expects it to be in /opt/cert-chain-resolver-py).

0 Comments
April 8, 2016

Antagonistic Co-operation

08:56 -0400

This article was originally written for our housing co-operative's newsletter. Even though it was written in the context of a housing co-operative, I think the idea is useful in other contexts as well.

-

The word "antagonistic" and its relatives generally have negative connotations. Nobody likes to be antagonized. In literature, the antagonist in a story works against the protagonist or main character, so we do not like to see the antagonist succeed. However, antagonism can be essential in some cases. Many of our muscles come in what are called "antagonistic pairs," without which you would not be able to move. Muscles can only pull (by contracting) and relax; muscles cannot push. If you only had biceps, you would only be able to bend your arm; you also need your triceps in order to be able to straighten your arm. Your basic movements rely on muscles that oppose each other, yet work together to allow you to walk, lift, or write.

But sometimes our muscles do not work as they should. If you have ever experienced a cramp, you know how painful this can be sometimes. A cramp happens when a muscle suddenly tightens and will not loosen. Many cases of back pain are also due to muscles that fail to relax as they should. Some people require regular massage therapy due to pain caused by tight muscles.

As a co-operative, we should strive to operate like a well functioning body. As members of our co-operative, we all have different opinions and priorities, and we pull our co-operative in different directions. Some people may be more focused on providing activities for our children, and some are more focused on helping our elders adapt to new challenges. Some people prefer to be frugal, while others may wish to spend money to improve the quality of life here. Some people value a strict adherence to our bylaws, while others adopt a more "live and let live" attitude. Each of these views is welcome in our co-operative, and we should celebrate our differences. Indeed, without different opinions pulling us in different directions, our co-operative would be as lifeless as a skeleton with no muscles.

But in order for our co-operative to get anywhere, we must be willing, not just to pull in the direction that we want to go, but also to sometimes let go when others are pulling in a different direction. Sometimes we must allow other members to go ahead with their opinions and priorities without getting in their way.

Unlike our bodies, however, our co-operative does not have a central "brain" coordinating our actions, telling us when to pull and when to let go. Instead we must, as a co-operative, come to an agreement among ourselves. We must communicate with each other, and come to understand the perspectives of other members. Then we can decide when each member should have an opportunity to pull so that we do not prevent our co-op from moving forward by pulling in opposite directions at the same time.

We often see people with opposing viewpoints as adversaries. But while we may be antagonistic, we can still be co-operative.

-

This article may be copied under the terms of the Creative Commons Attribution-ShareAlike license.

0 Comments
April 4, 2016
10:57 -0400
Hubert Chathi: With @cloudflare.com blocking @torproject.org users, I've disabled CloudFlare on most of my sites
0 Comments
March 28, 2016
12:59 -0400
Hubert Chathi: # learned how to climb stairs last night
0 Comments
March 23, 2016

leftpad improved

13:26 -0400

Improved versions of left-pad

Shorter version (removed unneeded variable, and change to for loop: 9 lines of code instead of 11):

module.exports = leftpad;

function leftpad (str, len, ch) {
  str = String(str);

  if (!ch && ch !== 0) ch = ' ';

  for (len -= str.length; len > 0; len--) {
    str = ch + str;
  }

  return str;
}

Faster version (only perform O(log n) concatenations, where n is the number of characters needed to pad to the right length):

module.exports = leftpad;

function leftpad (str, len, ch) {
  str = String(str);

  if (!ch && ch !== 0) ch = ' ';
  ch = String(ch);

  len -= str.length;

  while (len > 0) {
    if (len & 1) {
      str = ch + str;
    }
    len >>>= 1;
    ch += ch;
  }

  return str;
}

ES6 version (which may be faster if String.prototype.repeat is implemented natively):

module.exports = leftpad;

function leftpad (str, len, ch) {
  str = String(str);

  if (!ch && ch !== 0) ch = ' ';
  ch = String(ch);

  len -= str.length;

  if (len > 0) str = ch.repeat(len) + str;

  return str;
}

Of course, you could combine the last two by detecting whether String.prototype.repeat is defined:

module.exports = String.prototype.repeat ?
function leftpad (str, len, ch) {
  str = String(str);

  if (!ch && ch !== 0) ch = ' ';
  ch = String(ch);

  len -= str.length;

  if (len > 0) str = ch.repeat(len) + str;

  return str;
}
:
function leftpad (str, len, ch) {
  str = String(str);

  if (!ch && ch !== 0) ch = ' ';
  ch = String(ch);

  len -= str.length;

  while (len > 0) {
    if (len & 1) {
      str = ch + str;
    }
    len >>>= 1;
    ch += ch;
  }

  return str;
}

As with the original left-pad, this code is released under the WTFPL.

(See also pad-left, which uses another dependency (by the same author) for repeating strings)

0 Comments
March 16, 2016
17:02 -0400
Hubert Chathi: "When we started Apple, Steve Jobs and I talked about how we wanted to make blind people as equal and capable as sighted people, and you'd have to say we succeeded when you look at all the people walking down the sidewalk looking down at something in their hands and totally oblivious to everything around them!" - Steve Wozniak
0 Comments
February 25, 2016
11:11 -0500
Hubert Chathi: New ways to contact me: tox: 50A6B251E7B42A0B2FC8852D4B1E89BC2D71797F64B53E36A3F9365E381B8E6A9FFE72FC8756, Friendika/GNUSocial/Diaspora: https://social.uhoreg.ca/profile/hubert
0 Comments
February 18, 2016

Automating Let's Encrypt certificates on nginx

17:57 -0500

Let's Encrypt is a new Certificate Authority that provides free SSL certificates. It is intended to be automated, so that certificates are renewed automatically. We're using Let's Encrypt certificates for our set of free Calculus practice problems. Our front end is currently served by an Ubuntu server running nginx, and here's how we have it scripted on that machine. In a future post, I'll describe how it's automated on our Docker setup with HAProxy.

First of all, we're using acme-tiny instead of the official Let's Encrypt client, since it's much smaller and, IMHO, easier to use. It takes a bit more to set up, but works well once it's set up.

We installed acme-tiny in /opt/acme-tiny, and created a new letsencrypt user. The letsencrypt user is only used to run the acme-tiny client with reduced priviledge. In theory, you could run the entire renewal process with a reduced priviledge user, but the rest of the process is just basic shell commands, and my paranoia level is not that high.

We then install cert-chain-resolver-py into /opt/cert-chain-resolver-py. This script requires the pyOpenSSL library to be installed, so make sure that it's installed. On Debian/Ubuntu systems, it's the python-openssl package.

We created an /opt/acme-tiny/challenge directory, owned by the letsencrypt user, and we created /etc/acme-tiny with the following contents:

  • account.key: the account key created in step 1 from the acme-tiny README. This file should be readable only by the letsencrypt user.
  • certs: a directory containing a subdirectory for each certificate that we want. Each subdirectory should have a domain.csr file, which is the certificate signing request created in step 2 from the acme-tiny README. The certs directory should be publicly readable, and the subdirectories should be writable by the user that the cron job will run as (which does not have to be the letsencrypt user).
  • private: a directory containing a subdirectory for each certificate that we want, like we had with the certs directory. Each subdirectory has a file named privkey.key, which will be the private key associated with the certificate. To coincide with the common setup on Debian systems, the private directory should be readable only by the ssl-cert group.

Instead of creating the CSR files as described in the acme-tiny README, I created a script called gen_csr.sh:

#!/bin/bash
openssl req -new -sha256 -key /etc/acme-tiny/private/"$1"/privkey.pem -subj "/" -reqexts SAN -config <(cat /etc/ssl/openssl.cnf <(printf "[SAN]\nsubjectAltName=DNS:") <(cat /etc/acme-tiny/certs/"$1"/domains | sed "s/\\s*,\\s*/,DNS:/g")) > /etc/acme-tiny/certs/"$1"/domain.csr

The script is invoked as gen_scr.sh <name>. It reads a file named /etc/acme-tiny/certs/<name>/domains, which is a text file containing a comma-separated list of domains, and it writes the /etc/acme-tiny/certs/<name>/domain.csr file.

Now we need to configure nginx to serve the challenge files. We created a /etc/nginx/snippets/acme-tiny.conf file with the following contents:

location /.well-known/acme-challenge/ {
    auth_basic off;
    alias /opt/acme-tiny/challenge/;
}

(The "auth_basic off;" line is needed because some of our virtual hosts on that server use basic HTTP authentication.) We then modify the sites in /etc/nginx/sites-enabled that we want to use Let's Encrypt certificates to include the line "include snippets/acme-tiny.conf;".

After this is set up, we created a /usr/local/sbin/letsencrypt-renew script that will be used to request a new certificate:

#!/bin/sh
set +e

# only renew if certificate will expire within 20 days (=1728000 seconds)
openssl x509 -checkend 1728000 -in /etc/acme-tiny/certs/"$1"/cert.pem && exit 255

set -e
DATE=`date +%FT%R`
su letsencrypt -s /bin/sh -c "python /opt/acme-tiny/acme_tiny.py --account-key /etc/acme-tiny/account.key --csr /etc/acme-tiny/certs/\"$1\"/domain.csr --acme-dir /opt/acme-tiny/challenge/" > /etc/acme-tiny/certs/"$1"/cert-"$DATE".pem
ln -sf cert-"$DATE".pem /etc/acme-tiny/certs/"$1"/cert.pem
python /opt/cert-chain-resolver-py/cert-chain-resolver.py -o /etc/acme-tiny/certs/"$1"/chain-"$DATE".pem -i /etc/acme-tiny/certs/"$1"/cert.pem -n 1
ln -sf chain-"$DATE".pem /etc/acme-tiny/certs/"$1"/chain.pem
cat /etc/acme-tiny/certs/"$1"/cert-"$DATE".pem /etc/acme-tiny/lets-encrypt-x1-cross-signed.pem > /etc/acme-tiny/certs/"$1"/fullchain-"$DATE".pem
ln -sf fullchain-"$DATE".pem /etc/acme-tiny/certs/"$1"/fullchain.pem

The script will only request a new certificate if the current certificate will expire within 20 days. The certificates are stored in /etc/acme-tiny/certs/<name>/cert-<date>.pem (symlinked to /etc/acme-tiny/certs/<name>/cert.pem). The full chain (including the intermediate CA certificate) is stored in /etc/acme-tiny/certs/<name>/fullchain-<date>.pem (symlinked to /etc/acme-tiny/certs/<name>/fullchain.pem).

If you have pyOpenSSL version 0.15 or greater, you can replace the -n 1 option for cert-chain-resolver.py with something like -t /etc/ssl/certs/ca-certificates.crt, where /etc/ssl/certs/ca-certificates.crt should be set to the location of a set of trusted CA certificates in PEM format.

As-is, the script must be run as root, since it does a su to the letsencrypt user. It should be trivial to modify it to use sudo instead, so that it can be run by any user that has the appropriate permissions on /etc/acme-tiny.

the letsencrypt-renew script is run by another script that will restart the necessary servers if needed. For us, the script looks like this:

#!/bin/sh

letsencrypt-renew sbscalculus.com

RV=$?

set -e

if [ $RV -eq 255 ] ; then
  # renewal not needed
  exit 0
elif [ $RV -eq 0 ] ; then
  # restart servers
  service nginx reload;
else
  exit $RV;
fi

This is then called by a cron script of the form chronic /usr/local/sbin/letsencrypt-renew-and-restart. Chronic is a script from the moreutils package that runs a command and only passes through its output if it fails. Since the renewal script checks whether the certificate will expire, we run the cron task daily.

Of course, once you have the certificate, you want to tell nginx to use it. We have another file in /etc/nginx/snippets that, aside from setting various SSL parameters, includes

ssl_certificate /etc/acme-tiny/certs/sbscalculus.com/fullchain.pem;
ssl_certificate_key /etc/acme-tiny/private/sbscalculus.com/privkey.pem;

This is the setup we use for one of our server. I tried to make it fairly general, and it should be fairly easy to modify for other setups.

Update (Apr. 29, 2016): Let's Encrypt changed their intermediate certificate, so the old instructions for downloading the intermediate certificate are incorrect. Instead of using a static location for the intermediate certificate, it's best to use a tool such as https://github.com/muchlearning/cert-chain-resolver-py to fetch the correct intermediate certificate. The instructions have been updated accordingly.

0 Comments
February 3, 2016
14:03 -0500
Hubert Chathi: The fastest query is the one you don't do. Sped up the response time of some @sbscalculus.com pages from about 3s to about 0.3s by not fetching some irrelevant data. #
0 Comments
January 27, 2016

Automating browser-side unit tests with nodeunit and PhantomJS

11:24 -0500

I love unit tests, but they're only useful if they get run. For one of my projects at work, I have a set of server-side unit tests, and a set of browser-side unit tests. The server-side unit tests get run automatically on “git push`‘ via Buildbot, but the browser-side tests haven't been run for a long time because they don't work in Firefox, which is my primary browser, due to differences in the way it iterates through object keys.

Of course, automation would help, in the same way that automating the server-side tests ensured that they were run regularly. Enter PhantomJS, which is a scriptable headless WebKit environment. Unfortunately, even though PhantomJS can support many different testing frameworks, there is no existing support for nodeunit, which is the testing framework that I'm using in this particular project. Fortunately, it isn't hard to script support for nodeunit.

nodeunit's built-in browser support just dynamicall builds a web page with the test results and a test summary. If we just ran it as-is in PhantomJS, it would happily run the tests for us, but we wouldn't be able to see the results, and it would just sit there doing nothing when it was done. What we want is for the test results to be output to the console, and to exit when the tests are done (and exit with an error code if tests failed). To do this, we will create a custom nodeunit reporter that will communicate with PhantomJS.

First, let's deal with the PhantomJS side. Our custom nodeunit reporter will use console.log to print the test results, so we will pass through console messages in PhantomJS.

page.onConsoleMessage = function (msg) {
    console.log(msg);
};

We will use PhantomJS's callback functionality to signal the end of the tests. The callback data will just be an object containing the total number of assertions, the number of failed assertions, and the time taken.

page.onCallback = function (data) {
    if (data.failures)
    {
        console.log("FAILURES: " + data.failures + "/" + data.length + " assertions failed (" + data.duration + "ms)")
    }
    else
    {
        console.log("OK: " + data.length + " assertions (" + data.duration + "ms)");
    }
    phantom.exit(data.failures);
};

(Warning: the callback API is marked as experimental, so may be subject to change.)

If the test page fails to load for whatever reason, PhantomJS will just sit there doing nothing, which is not desirable behaviour, so we will exit with an error if something fails.

phantom.onError = function (msg, trace) {
    console.log("ERROR:", msg);
    for (var i = 0; i < trace.length; i++)
    {
        var t = trace[i];
        console.log(i, (t.file || t.sourceURL) + ': ' + t.line + t.function ? t.function : "");
    }
    phantom.exit(1);
};
page.onError = function (msg, trace) {
    console.log("ERROR:", msg);
    for (var i = 0; i < trace.length; i++)
    {
        var t = trace[i];
        console.log(i, (t.file || t.sourceURL) + ': ' + t.line + t.function ? t.function : "");
    }
    phantom.exit(1);
};
page.onLoadFinished = function (status) {
    if (status !== "success")
    {
        console.log("ERROR: page failed to load");
        phantom.exit(1);
    }
};
page.onResourceError = function (resourceError) {
    console.log("ERROR: failed to load " + resourceError.url + ": " + resourceError.errorString + " (" + resourceError.errorCode + ")");
    phantom.exit(1);
};

Now for the nodeunit side. The normal test page looks like this:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>ML Editor Test Suite</title>
    <link rel="stylesheet" href="stylesheets/nodeunit.css" type="text/css" />
    <script src="javascripts/module-requirejs.js" type="text/javascript"></script>
    <script src="javascripts/requirejs-config.js" type="text/javascript"></script>
    <script data-main="test" src="javascripts/require.js" type="text/javascript"></script>
  </head>
  <body>
    <h1 id="nodeunit-header">ML Editor Test Suite</h1>
  </body>
</html>

If you're not familiar with RequireJS pages, the <script data-main="test" src="javascripts/require.js" type="text/javascript"></script> line means that the main JavaScript file is called "test.js". We want to use the same script file for both a normal browser test and the PhantomJS-based test, so in PhantomJS, we will set window.nodeunit_reporter to our custom reporter. In "test.js", then, we will check for window.nodeunit_reporter, and if it is present, we will replace nodeunit's default reporter. Although there's no documented way of changing the reporter in the browser version of nodeunit, looking at the code, it's pretty easy to do.

if (window.nodeunit_reporter) {
    nodeunit.reporter = nodeunit_reporter;
    nodeunit.run = nodeunit_reporter.run;
}

(Disclaimer: since this uses an undocumented interface, it may break some time in the future.)

So what does a nodeunit reporter look like? It's just an object with two items: info (which is just a textual description) and run. run is a function that calls the nodeunit runner with a set of callbacks. I based the reporter off of a combination of nodeunit's default console reporter and its browser reporter.

window.nodeunit_reporter = {
    info: "PhantomJS-based test reporter",
    run: function (modules, options) {
        var opts = {
            moduleStart: function (name) {
                console.log("\n" + name);
            },
            testDone: function (name, assertions) {
                if (!assertions.failures())
                {
                    console.log("✔ " + name);
                }
                else
                {
                    console.log("✖ " + name);
                    assertions.forEach(function (a) {
                        if (a.failed()) {
                            console.log(a.message || a.method || "no message");
                            console.log(a.error.stack || a.error);
                        }
                    });
                }
            },
            done: function (assertions) {
                window.callPhantom({failures: assertions.failures(), duration: assertions.duration, length: assertions.length});
            }
        };
        nodeunit.runModules(modules, opts);
    }
};

Now in PhantomJS, I just need to get it to load a modified test page that sets window.nodeunit_reporter before loading "test.js", and voilà, I have browser tests running on the console. All that I need to do now is to add it to my Buildbot configuration, and then I will be alerted whenever I break a browser test.

The script may or may not work in SlimerJS, allowing the tests to be run in a Gecko-based rendering engine, but I have not tried it since, as I said before, my tests don't work in Firefox. One main difference, though, is that SlimerJS doesn't honour the exit code, so Buildbot would need to parse the output to determine whether the tests passed or failed.

0 Comments
January 18, 2016

When native code is slower than interpreted code

16:56 -0500

At work, I'm working on a document editor, and it needs to be able to read in HTML data. Well, that's simple, right? We're in a browser, which obviously is able to parse HTML, so just offload the HTML parsing to the browser, and then traverse the DOM tree that it creates.

var container = document.createElement("div");
container.innerHTML = html;

The browser's parser is native code, built to be robust, well tested. What could go wrong?

Unfortunately, going this route, it ended up taking about 70 seconds to parse a not-very-big document on my 4 year old laptop. 70 seconds. Not good.

Switching to a JavaScript-based HTML parser saw the parsing time drop down to about 9 seconds. Further code optimizations in other places brought it down to about 3 seconds. Not too bad.

So why is the JavaScript parser faster than the browser's native parser? Without digging into what the browser is actually doing, my best guess is that the browser isn't just parsing the HTML, but is also calculating styles, layouts, etc. This guess seems to be supported by the fact that not all HTML is parsed slowly; some other HTML of similar size is parsed very quickly (faster than using the JavaScript-based parser). But it can't be the whole story, because the browser is able to display that same HTML fairly quickly.

I may have to do some further investigations, but I guess the moral of the story is to not assume that offloading work is the fastest solution.

0 Comments