December 30, 2013

Random testing

17:55 -0500

My current project at work requires implementing non-trivial data structures and algorithms, and despite my best efforts (unit testing consisting of over 600 assertions), I don't have everything unit tested. In order to find bugs in my code, I've created a randomized tester.

First of all, the code is structured so that all operations are decoupled from the interface, which means that it can be scripted; anything that a user can do from the interface can also be done programmatically. Of course, this is a requirement for any testable code.

I want to make sure that the code is tested in a variety of scenarios, but without having to create the tests manually. So I let the computer generate it (pseudo)randomly. Basically, my test starts with a document (which, for now, is hard-coded). The program then creates a series of random operations to apply to the document: it randomly selects a type of operation, and then randomly generates an operation of that type. It then runs some tests on the resulting document, and checks for errors.

Most of the time, when doing random things, you don't want things to be repeatable; if you write a program to generate coin flips, you don't want the same results every time you run the program. In this case, however, I need to be able to re-run the exact same tests over and over; if the tests find a bug, I need to be able to recreate the conditions leading to the bug, so that I can find the cause. Unfortunately, JavaScript's default random number generator (unlike many other programming languages) is automatically seeded, and provides no way of setting the seed. That isn't a major problem, though — we just need to use an alternate random number generator. In this case, I used an implementation of the Mersenne Twister. Now, I just hard-code the seed, and every time I run the tester, I get the same results. And if I want a different test, I just need to change the seed.

It seems to be working well so far. I've managed to squish some bugs, uncover some logic errors, and, of course, some silly errors too. Of course, the main downside is that I can't be sure that the random tests cover all possible scenarios, but the sheer number of tests that are generated far exceeds what I would be able to reasonably do by hand, and my hand-written tests weren't covering all possible scenarios anyways.

Addendum: I should add that when the randomized tester finds a bug, I try to distill it to a minimal test case and create a new unit test based on the randomized tester result.

December 12, 2013

LinkedIn RSS feed retirement

14:06 -0500

Dear LinkedIn,

Since you are retiring the LinkedIn Network RSS Feed, as of December 19, I will be visiting LinkedIn even less. Removal of the RSS feed makes it less convenient for me to follow my network activity, and as a result, I will not be using LinkedIn as much as I used to.

December 3, 2013

Jungle Gym

10:23 -0500

The jungle gym that we built for the kids is a hit with our little fire fighter.

September 29, 2013

Switching to RamNode

00:13 -0400

After many years at UltraHosting, I've switched my VPS hosting to RamNode (affiliate link). The switch should have been fairly transparent, with the exception that my Jabber server would have been unreachable for a little while, while DNS records propagated.

Why did I switch?

First of all, RamNode is much less expensive. For all that technology has been getting cheaper and better, I had been paying the same for my VPS as when I had first ordered it, with no improvements. With RamNode, I'm paying less, and getting a better system(twice the RAM, twice the CPU, and supposedly faster due to SSD caching).

Secondly, RamNode supports IPv6. Each VPS gets 16 IPv6 addresses (which IMHO is overkill, but I'm not complaining). Native, not tunnelled. I find it surprising that most providers still don't support IPv6. UltraHosting didn't even support TUN/TAP, so I couldn't even get a tunnelled IPv6.

Finally, RamNode, from the reviews that I've read, has really great support. They are very open with maintenance issues. I would rank my experiences with UltraHosting's support as "mediocre". They dealt with my support issues fine, but it was still lacking. I had ongoing time synchronization issues, and I wasn't provided with information regarding their outgoing SMTP setup (necessary for proper SPF records). I haven't had to deal with RamNode support yet, but the reviews that I've read indicate that their support is very responsive, and the fact that they publicize maintenance issues is promising.

But aren't you worried about the NSA?

RamNode is a US-based company, and the servers are located in the US or Netherlands, whereas UltraHosting is a Canadian company, and the servers are located in Canada. With all the noise about the NSA recently, it might seem risky to move my data to the US.

However, my server doesn't store any private information — aside from my SSL key (which the NSA can already spoof, due to the nature of SSL), and a limited SSH key to back up my data to my home server. Everything else that's stored on my server, the NSA already has access to. (Pretty much all the email that I send and receive passes through the US already.)

I am, however, taking some extra precautions, and avoiding transmitting any sensitive data to or through my server unprotected. But that's good security practices anyways.

So, overall, the NSA and surveillance in general is a concern, but does not affect my VPS server.

September 6, 2013

The chief virtues of permaculture: a tongue-in-cheek introduction to permaculture

15:57 -0400

Note: This article was originally written for the Beaver Creek Dam News.

Larry Wall, the creator of the Perl computer programming language, identifies the chief virtues of a programmer as laziness, impatience, and hubris (and I, as a programmer, have plenty of all three). I would say that laziness is also one of the chief virtues of permaculture. The term "permaculture" comes from shortening "permanent agriculture," or "permanent culture," with the aim of creating an ecosystem that will outlast the designer. (I can think of no greater state of laziness than being dead.) The ideal permaculture design is one in which the only work that needs to be done is harvesting. While the reality is that no design will completely eliminate human work, even if only to guide the progression of the ecosystem, the goal is to avoid as much work as possible. Most of the heavy lifting, once the design is established, is done by nature. For example, rather than spraying insecticides to kill crop-eating bugs, permaculture lures in beneficial organisms to control pests. Rather than heavily fertilizing and tilling soil, permaculture uses the plants themselves, along with bacteria, fungi, worms, and bugs to build soil fertility. Permaculturists also often enlist the aid of animals in order to till and fertilize the soil, to control weeds, and sometimes even to help harvest. While traditional gardening is dominated by annuals, which must be planted (and in some cases transplanted) every year, permaculture places a greater emphasis on perennial and self-sowing plants.

Although the end goal is to avoid doing work, the necessity is that a lot of work must be done in planning and developing a permaculture design. One must determine which plants are needed in order to fulfil the required functions. The soil may need a jump start (especially if the soil had previously been abused), often through a technique known as sheet mulching, which can be very labour intensive. However, the ultimate payoff is a garden that mostly cares for itself and that requires much less labour than conventional gardening (and maybe even less labour than going to the grocery store).

Another virtue of permaculture is greediness. Permaculture tries to squeeze as much productivity out of the land as it can. Conventional gardens grow individual crops in each location. Meanwhile, forest gardening, one of the keystones of permaculture, aims to have seven layers of plants, all growing together (or possibly eight layers, if you take Stamets’ advice of growing mushrooms). Forest gardening attempts to mimic the way that forests grow in nature. Forests are highly productive areas, requiring no input from humans to achieve such a high level of productivity. While most people will think about the trees when thinking about a forest, the forest would not be as productive or as healthy without the shrubs, ground cover, flowers, and vines. Permaculture also aims to reduce "wasted" space, such as paths for accessing the plants, using patterns such as keyhole beds. Keyhole beds also encourage laziness: you can sit in the middle of a keyhole bed and harvest an array of crops around you.

Not satisfied with just demanding much from the land, permaculture expects much from plants as well. Most gardeners will grow a plant for a single purpose, such as for ornamental, or edibility purposes. However, a single plant can perform multiple functions, such as providing food, attracting or sustaining beneficial organisms, providing shade during hot summer days, providing shelter from cold winds, improving soil, providing beautiful flowers, providing fragrance, or providing wood for fuel or for crafts. Permaculturists try to use plants for as many functions as they can provide.

A third virtue of permaculture is attention deficit. Modern farming's monoculture is a permaculturist's nightmare. Boooooring! For permaculture, variety is king. I think that Amanda has lost count of the number of times that I've exclaimed, "Hey, we should grow this plant too!" It is not uncommon for permaculturists to cultivate over a hundred species of plants in a single garden. Diversity improves the chances of survival. While a monoculture can be wiped out by a single type of pest or by unusual weather, a diverse ecosystem is more resilient. If one crop fails one year, other crops can take its place. A diverse ecosystem is also less likely to attract devastating quantities of pests in the first place — a monoculture looks like an all-you-can-eat buffet, while a garden with interplanted crops requires more work for the pests to travel between their favourite meals. Furthermore, pest eaters may be lying in wait, having been initially attracted by their favourite snacks. Having a diversity of plants can ensure that the pest eaters are around year-round: when one plant's flowering season is over, another plant can take over.

Similar to how hard work is required before laziness is allowed, a permaculture design must start with careful observation before the mind is allowed to wander. Permaculture design starts with observing different aspects of the site for factors such as soil composition, sun and wind patterns, wildlife, water, et cetera, at various times of the day, and throughout the year. Some permaculture experts suggest observing for a full year before doing any planting. Observation informs the design, indicating, for example, where different plants need to go in order to take advantage of sun and shade, where barriers are needed, what soil alterations are needed, or what plant functions should be sought out.

I believe, then, that the chief virtues of permaculture are: laziness, greediness, and attention deficit. Amanda suggested to rephrase them as: good time management, good stewardship of the land, and diversity, but I don't mind being called a lazy, greedy guy with a short attenti... Hey, we should grow this plant too!

Further reading

If you want to learn more about permaculture, I would recommend two books as a good starting point. Toby Hemenway's Gaia's Garden is a very down-to-earth (pardon the pun), easy to read book with many helpful drawings, tables, and examples. It outlines all the basic permaculture principles, and explains how to create a permaculture design.

For those who want read a story rather than manual, Eric Toensmeier and Jonathan Bates’ Paradise Lot is a book about how two friends developed a thriving permaculture garden in a tenth-of-an-acre lot in the middle of the city. With humour and romance (who knew that silk worm caterpillars would make such a wonderful gift for your sweetheart), the book gives a flavour for the process of designing and establishing a permaculture garden. Although it is primarily a story of a single permaculture garden, and does not go into as much detail about different techniques as Gaia's Garden, Paradise Lot still contains a lot of helpful information.

September 3, 2013

St. Jacobs Farmers Market fire roundup

16:38 -0400

We went to see the St. Jacobs Farmers Market yesterday evening. They had the whole site taped off, but it's the busiest that I've seen the Market area on a non-Market day.

#SJFMFire was trending in Canada yesterday.

Waterloo Region Fire has their album of the fire online.

The un-burned parts of the market (outdoor vendors and Peddlar's Village) will be open on Thursday and Saturday.

There is an (unofficial) listing of vendors and alternate locations.

There's already talk of rebuilding, but nothing concrete yet.

May 21, 2013

Google's walled garden

15:12 -0400

If you have me as a contact in Google Talk, you may no longer be able to chat with me, because Google seems to be dropping support for chatting with non-Google accounts.

Google is dropping XMPP (Jabber) federation from their new chat system, which is more or less the same as preventing you from emailing non-GMail users from your GMail account, except with instant messaging. The instant messaging space is already a fragmented mess, and Jabber was the only possibility for unification.

Google was the first major company to provide public, federated Jabber accounts outside of (though they didn't support federation at the beginning). They even contributed to the XMPP standards through their Jingle protocol (though Google had their own incompatible version and AFAIK never fully supported the final official Jingle protocol).

But it looks like Google is taking steps to becoming its own walled garden. It dropped CalDAV support recently (except for certain whitelisted applications). Google+ (and before that, Google Buzz) doesn't interoperate with any other system, nor does it seem to be built with interoperation in mind. They completely dropped Reader. They killed Wave before they released it. Now, with XMPP federation gone, Google's only interoperable products are GMail, Groups, and Blogger.

May 10, 2013


14:51 -0400

I've started trying CloudFlare as a CDN for my web site. They have a nice free plan that works well for personal and smaller sites. So far it seems to be going well. Not that I was having any problems with my hosting, but it's a fun thing to try. And it's nice to have some security in knowing that my site won't blow up if I accidentally post something wildly popular and get Slashdotted. And in the case of a server failure, as happened yesterday, my site is still (somewhat) accessible.

As a brief description of how CloudFlare works, you host your site as normal, and visitors will connect to CloudFlare's servers rather than your own. If CloudFlare has a cached version of your page (that is still current), then the visitor gets sent the cached copy; otherwise it queries your server for the contents.

The Good

One nice thing about CloudFlare is that it includes IPv6 support, even in the free plan, which is great for when your host doesn't have IPv6 support. Of course, I still don't have IPv6 ssh, email or Jabber, but it's a start.

CloudFlare also doesn't charge for bandwidth. If your site gets Slashdotted, you don't have to worry about increased charges (at least, not from CloudFlare — you might still exceed your host's bandwidth allowance). You just pay your plan's rates (or not pay, in the case of the free plan).

Redundancy. I'll say it again: redundancy. If my host should ever become unavailable for whatever reason (yesterday, it was hardware failure), my site (at least whatever CloudFlare has cached) is still accessible. If CloudFlare ever goes down for an extended period of time, I can change the DNS records so that visitors go directly to my server. (Of course, due to propagation delays, that would have to be quite an extended CloudFlare outage.)

CloudFlare is also a very transparent company. They had a network failure not too long ago, and they were very upfront about what happened. No matter what you do, something will always go wrong at some point. It's how you react to things going wrong that differentiate companies. Even when things don't go wrong, CloudFlare shares a lot of their technical details on their blog.

The Bad

Unfortunately, there are some limitations. CloudFlare takes over your DNS service. This is necessary for how CloudFlare operates: they must be able to return different DNS responses depending on the visitor's location. But this means that you must use CloudFlare's DNS editing interface, which isn't as flexible as editing a zone file by hand. I had to give up my CSA and RP records. Given that almost nobody uses CSA, and my contact information is easily found on my site, it isn't a great loss.

Another limitation is that the free plans doesn't include SSL support. This is perfectly reasonable, given that it's a free plan — you need to give people a reason to pay. But it's something to be aware of. I only really needed SSL for my OpenID service, so I just put it on a different host name, and set it to not be handled by CloudFlare.

The Questionable

CloudFlare is a very popular CDN, which means that if it should ever go down, it would take down a lot of sites. It would also be a popular target for attack. As more sites use CloudFlare, it's becoming an Internet monoculture. So if anyone knows of a similar service, let me know and I'll give it a try.

All CloudFlare plans include analytics. Unfortunately, their analytics is very basic; it only indicates how many visitors (regular, potential threat, web crawler) you got over certain periods of time. Threats are broken down by country, but regular visitors are not. The statistics are also not broken down by URL, browser, etc. Of course, some analytics is better than the no analytics that I had before. However, since not all web requests reach your server, and you don't have access to the raw logs on their server (unless you have a Business plan), this prevents you from running your own (reliable) analytics, should I ever have had the time to set it up. If course, you can use your own (or a 3rd-party) JavaScript-based analytics system, but it isn't as accurate.

All in all, CloudFlare seems like a good service, and the price is right. I'll keep it on trial for a bit longer, but it seems like I'll be keeping it.

April 22, 2013

Thoughts on literate programming

12:50 -0400

At work, I've been implementing a data structure to make our collaborative editor run quickly. As part of that work, I've had to write a couple of complex functions (a couple 200+ line functions), which got me thinking about comments, readability, and presentation.

If you've never heard of literate programming, it's an idea introduced by Knuth (surely you've heard of him) that combines programming with documentation intended for human consumption. The program is presented in a document written for people to read, and transformed by a program into something a computer can execute. (The Wikipedia article on literate programming gives a decent description.)

I've dabbled a bit with literate programming in the past. In fact, I'm the maintainer for the noweb package in Debian. One of my (very) long-term projects is to build a free data structure library written for people to learn how the data structures work, and I've started implementing a couple simple data structures in literate programming style. However, looking at literate programming again, it seems to me that it has a few deep limitations.

First of all, if you want to describe something in depth, you're forcing everyone to read it, even if they aren't interested. For example, in the wc example, “#include <stdio.h>” takes 3 lines, even though anyone who has read an introductory C programming book will know immediately why that's there. On the other hand, you might want to include that for beginner programmers. One of the frustrating things I found when writing research papers was that I often had to go into too much detail, to make sure that every single step was covered, which I felt sometimes turned a short, simple proof into something unwieldy. What I would have liked to do was something like Leslie Lamport's (of LATEX fame) hierarchical proofs (though it doesn't translate well to printed text, and needs a more dynamic medium like a web page).

This limitation is partially due to the time that literate programming was conceived. With printed text, either you write something and everyone sees it (even if they just skim it, it's still there for them to see), or you omit it and nobody sees it. With something like a web page, however, you don't have this limitation. You can write “#include <stdio.h>”, and hide the descriptive text unless the reader wants to learn more.

Another limitation that I find with literate programming is that one of its underlying implications is that code is a lesser way of communicating between people, and that people communicate best using natural language. Each code chunk is intended to be described in words. While natural language is the best tool for general human communication, a small chunk of well-written code, like well-written mathematical notation, can be very effective in communicating certain ideas. Literate programming would encourage you to write the chunk twice, once is code and once in natural language, even if the code is a sufficient (or even sometimes better) way of communicating the idea. Going back to the stdio.h example, just writing “#include <stdio.h> // we send formatted output to stdout and stderr” would be a sufficient description for most programmers.

Related to this, literate programming pulls code chunks out of context, which sometimes is an important part in understanding how the code works. Seeing the code in context gives clues about what state the computer is in before it is executed, and what is expected after it executes. Of course you can always describe that in text, but seeing the code in context sometimes gives experienced programmers a more intuitive feel for how the code works.

One thing that I like about literate programming, though, is that emphasizes understanding over a line-by-line presentation. For example if you have two chunks of code that operate on the same data (say one reads and the other writes), or if you have two chunks that have operate similarly, then you would write those together, instead of having them spread out according to how the computer would execute them. It also allows you to deal with more important or interesting parts first, and leave the more mundane parts for later (I would have put “#include <stdio.h>” near the end of the document).

It is also useful to have at your disposal some of the document-writing tools, such as sectioning, lists, mathematical equations, and beautifully formatted text (and not having to make sure that your lines are wrapped properly).

While I think that literate programming is a great idea for presenting code in an understandable manner, I think that it has a lot of room for improvement, especially if we can take advantage of some of the features of the web. I'm doing some experimentation, and I hope to have some positive results.

April 1, 2013

Useless metrics

15:47 -0400

Just for fun, I decided to run David A. Wheeler's SLOCCount on my current work project. Here is the output (with the default options, slightly cleaned up):

SLOC	Directory	SLOC-by-Language (Sorted)
10656   mleditor        js=10656
2299    util            js=2299

Totals grouped by language (dominant language first):
js:           12955 (100.00%)

Total Physical Source Lines of Code (SLOC)                = 12,955
Development Effort Estimate, Person-Years (Person-Months) = 2.95 (35.34)
 (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05))
Schedule Estimate, Years (Months)                         = 0.81 (9.69)
 (Basic COCOMO model, Months = 2.5 * (person-months**0.38))
Estimated Average Number of Developers (Effort/Schedule)  = 3.65
Total Estimated Cost to Develop                           = $ 397,833
 (average salary = $56,286/year, overhead = 2.40).
SLOCCount, Copyright (C) 2001-2004 David A. Wheeler
SLOCCount is Open Source Software/Free Software, licensed under the GNU GPL.
SLOCCount comes with ABSOLUTELY NO WARRANTY, and you are welcome to
redistribute it under certain conditions as specified by the GNU GPL license;
see the documentation for details.
Please credit this data as "generated using David A. Wheeler's 'SLOCCount'."

Note: This includes some, but not all, unit tests. I had to modify SLOCCount to support JavaScript — I just used the C parser.

I started working on the project in October, so I've spent 6 months on it. So according to the COCOMO model, I've produced almost $400,000 worth of work (at 2004 wages) in 6 months.

I think I need a raise. wink emoticon

(P.S. If you're lucky enough, you'll get the Bill Gates quote in the random quote section on the right-hand side of this page.)