CloudFlare
I've started trying CloudFlare as a CDN for my web site. They have a nice free plan that works well for personal and smaller sites. So far it seems to be going well. Not that I was having any problems with my hosting, but it's a fun thing to try. And it's nice to have some security in knowing that my site won't blow up if I accidentally post something wildly popular and get Slashdotted. And in the case of a server failure, as happened yesterday, my site is still (somewhat) accessible.
As a brief description of how CloudFlare works, you host your site as normal, and visitors will connect to CloudFlare's servers rather than your own. If CloudFlare has a cached version of your page (that is still current), then the visitor gets sent the cached copy; otherwise it queries your server for the contents.
The Good
One nice thing about CloudFlare is that it includes IPv6 support, even in the free plan, which is great for when your host doesn't have IPv6 support. Of course, I still don't have IPv6 ssh, email or Jabber, but it's a start.
CloudFlare also doesn't charge for bandwidth. If your site gets Slashdotted, you don't have to worry about increased charges (at least, not from CloudFlare — you might still exceed your host's bandwidth allowance). You just pay your plan's rates (or not pay, in the case of the free plan).
Redundancy. I'll say it again: redundancy. If my host should ever become unavailable for whatever reason (yesterday, it was hardware failure), my site (at least whatever CloudFlare has cached) is still accessible. If CloudFlare ever goes down for an extended period of time, I can change the DNS records so that visitors go directly to my server. (Of course, due to propagation delays, that would have to be quite an extended CloudFlare outage.)
CloudFlare is also a very transparent company. They had a network failure not too long ago, and they were very upfront about what happened. No matter what you do, something will always go wrong at some point. It's how you react to things going wrong that differentiate companies. Even when things don't go wrong, CloudFlare shares a lot of their technical details on their blog.
The Bad
Unfortunately, there are some limitations. CloudFlare takes over your DNS service. This is necessary for how CloudFlare operates: they must be able to return different DNS responses depending on the visitor's location. But this means that you must use CloudFlare's DNS editing interface, which isn't as flexible as editing a zone file by hand. I had to give up my CSA and RP records. Given that almost nobody uses CSA, and my contact information is easily found on my site, it isn't a great loss.
Another limitation is that the free plans doesn't include SSL support. This is perfectly reasonable, given that it's a free plan — you need to give people a reason to pay. But it's something to be aware of. I only really needed SSL for my OpenID service, so I just put it on a different host name, and set it to not be handled by CloudFlare.
The Questionable
CloudFlare is a very popular CDN, which means that if it should ever go down, it would take down a lot of sites. It would also be a popular target for attack. As more sites use CloudFlare, it's becoming an Internet monoculture. So if anyone knows of a similar service, let me know and I'll give it a try.
All CloudFlare plans include analytics. Unfortunately, their analytics is very basic; it only indicates how many visitors (regular, potential threat, web crawler) you got over certain periods of time. Threats are broken down by country, but regular visitors are not. The statistics are also not broken down by URL, browser, etc. Of course, some analytics is better than the no analytics that I had before. However, since not all web requests reach your server, and you don't have access to the raw logs on their server (unless you have a Business plan), this prevents you from running your own (reliable) analytics, should I ever have had the time to set it up. If course, you can use your own (or a 3rd-party) JavaScript-based analytics system, but it isn't as accurate.
All in all, CloudFlare seems like a good service, and the price is right. I'll keep it on trial for a bit longer, but it seems like I'll be keeping it.
Thoughts on literate programming
At work, I've been implementing a data structure to make our collaborative editor run quickly. As part of that work, I've had to write a couple of complex functions (a couple 200+ line functions), which got me thinking about comments, readability, and presentation.
If you've never heard of literate programming, it's an idea introduced by Knuth (surely you've heard of him) that combines programming with documentation intended for human consumption. The program is presented in a document written for people to read, and transformed by a program into something a computer can execute. (The Wikipedia article on literate programming gives a decent description.)
I've dabbled a bit with literate programming in the past. In fact, I'm the maintainer for the noweb package in Debian. One of my (very) long-term projects is to build a free data structure library written for people to learn how the data structures work, and I've started implementing a couple simple data structures in literate programming style. However, looking at literate programming again, it seems to me that it has a few deep limitations.
First of all, if you want to describe something in depth, you're forcing everyone to read it, even if they aren't interested. For example, in the wc example, “#include <stdio.h>” takes 3 lines, even though anyone who has read an introductory C programming book will know immediately why that's there. On the other hand, you might want to include that for beginner programmers. One of the frustrating things I found when writing research papers was that I often had to go into too much detail, to make sure that every single step was covered, which I felt sometimes turned a short, simple proof into something unwieldy. What I would have liked to do was something like Leslie Lamport's (of LATEX fame) hierarchical proofs (though it doesn't translate well to printed text, and needs a more dynamic medium like a web page).
This limitation is partially due to the time that literate programming was conceived. With printed text, either you write something and everyone sees it (even if they just skim it, it's still there for them to see), or you omit it and nobody sees it. With something like a web page, however, you don't have this limitation. You can write “#include <stdio.h>”, and hide the descriptive text unless the reader wants to learn more.
Another limitation that I find with literate programming is that one of its underlying implications is that code is a lesser way of communicating between people, and that people communicate best using natural language. Each code chunk is intended to be described in words. While natural language is the best tool for general human communication, a small chunk of well-written code, like well-written mathematical notation, can be very effective in communicating certain ideas. Literate programming would encourage you to write the chunk twice, once is code and once in natural language, even if the code is a sufficient (or even sometimes better) way of communicating the idea. Going back to the stdio.h example, just writing “#include <stdio.h> // we send formatted output to stdout and stderr” would be a sufficient description for most programmers.
Related to this, literate programming pulls code chunks out of context, which sometimes is an important part in understanding how the code works. Seeing the code in context gives clues about what state the computer is in before it is executed, and what is expected after it executes. Of course you can always describe that in text, but seeing the code in context sometimes gives experienced programmers a more intuitive feel for how the code works.
One thing that I like about literate programming, though, is that emphasizes understanding over a line-by-line presentation. For example if you have two chunks of code that operate on the same data (say one reads and the other writes), or if you have two chunks that have operate similarly, then you would write those together, instead of having them spread out according to how the computer would execute them. It also allows you to deal with more important or interesting parts first, and leave the more mundane parts for later (I would have put “#include <stdio.h>” near the end of the document).
It is also useful to have at your disposal some of the document-writing tools, such as sectioning, lists, mathematical equations, and beautifully formatted text (and not having to make sure that your lines are wrapped properly).
While I think that literate programming is a great idea for presenting code in an understandable manner, I think that it has a lot of room for improvement, especially if we can take advantage of some of the features of the web. I'm doing some experimentation, and I hope to have some positive results.
Useless metrics
Just for fun, I decided to run David A. Wheeler's SLOCCount on my current work project. Here is the output (with the default options, slightly cleaned up):
SLOC Directory SLOC-by-Language (Sorted) 10656 mleditor js=10656 2299 util js=2299 Totals grouped by language (dominant language first): js: 12955 (100.00%) Total Physical Source Lines of Code (SLOC) = 12,955 Development Effort Estimate, Person-Years (Person-Months) = 2.95 (35.34) (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05)) Schedule Estimate, Years (Months) = 0.81 (9.69) (Basic COCOMO model, Months = 2.5 * (person-months**0.38)) Estimated Average Number of Developers (Effort/Schedule) = 3.65 Total Estimated Cost to Develop = $ 397,833 (average salary = $56,286/year, overhead = 2.40). SLOCCount, Copyright (C) 2001-2004 David A. Wheeler SLOCCount is Open Source Software/Free Software, licensed under the GNU GPL. SLOCCount comes with ABSOLUTELY NO WARRANTY, and you are welcome to redistribute it under certain conditions as specified by the GNU GPL license; see the documentation for details. Please credit this data as "generated using David A. Wheeler's 'SLOCCount'."
Note: This includes some, but not all, unit tests. I had to modify SLOCCount to support JavaScript — I just used the C parser.
I started working on the project in October, so I've spent 6 months on it. So according to the COCOMO model, I've produced almost $400,000 worth of work (at 2004 wages) in 6 months.
I think I need a raise. ![]()
(P.S. If you're lucky enough, you'll get the Bill Gates quote in the random quote section on the right-hand side of this page.)
Wave, drawing, and what not to do
A few months ago, I wrote a blog post about Wave, in which I said that Wave wouldn't be my first choice as a protocol for collaborative vector graphics. Here is an expansion on that statement.
Obviously, when designing something, you want to avoid reinventing things that you don't need to. The Wave protocol operates on documents that have a similar model to XML, or at least the most common parts of XML. SVG is an XML-based format for vector graphics. So a temptation would be to slap Wave on top of SVG to do collaborative drawing. Here are some reasons why that wouldn't be the best idea.
Note that we will be taking a simplified view of SVG, so some of my statements may not be completely accurate, if you want to nit pick. However, the ideas behind the statements should still be valid.
Locking
First of all, I've always thought that a collaborative drawing protocol should include some sort of locking; it would probably be confusing if two users tried to drag the same object at the same time. So that eliminates the possibility of using a stock Wave implementation, but it shouldn't be too hard to add locking on top of the Wave protocol.
Rendering order
In SVG, objects are rendered based (more or less) on their order in the document tree. That is, objects that appear earlier in the document are rendered first. Now consider what happens when someone tries to change the order of the objects (for example, moving an object to the back or to the front). The only way to do this with the Wave protocol is to delete the object from its current position in the document, and re-insert it in its new position.
However, if another user is modifying the object at the same time, since the Wave server has no way of knowing that the deletion and insertion represent the same object, when the server resolves the conflict, the unmodified object will be re-inserted, losing the second user's changes. In addition, if two users try to change the rendering order of the same object at the same time, the object could get re-inserted twice.
We have a similar issue with object grouping, but we will only look at object ordering.
How do we fix this? One way might be to change the document model: we could use an attribute to store the object ordering, rather than using the document ordering. If we try to just use a simple integer sequence (that is, 0, 1, 2,…) for the object ordering attribute, then changing object ordering may result in most of these attributes changing, and so if multiple users try to change ordering at the same time, the server (if it is naive) may not be able to resolve the conflicts, while still maintaining the property that the attributes are a simple integer sequence. We could try using decimal numbers (e.g. to move an object between the objects ordered 3 and 4, we give the object an order of 3.5), but then we may get the ugliness of extremely long decimal numbers. This could be solved by having a watcher that periodically renumbers the objects. As long as it doesn't try to renumber the objects while a user is also reordering the objects.
Another way is to change the protocol: add an operation for reordering the objects. Of course, adding operations means that we need to do more work figuring out how it fits in with the others. So the fewer operations that we need to add, the better. For object ordering, we can probably get by with just one operation, specifying an object, and a delta in its rendering order.
Another option is to just say that these types of conflicts should happen rarely enough that we don't care about them, and just use Wave and SVG unchanged. This is certainly a valid option, as long as the users are prepared for this (or as long as you are prepared to deal with the users). It can be argued that textual documents have a similar issue when users move text from one area to another, but it is probably less of an issue with textual documents since not all editors include a "move" operation, and even if it is included, it not commonly used. Instead, users usually "copy-and-paste", which arguably makes this type of conflict less confusing.
Object nodes
Now consider the actual description of an object. Let's just look at the <path> element. The nodes of a path are represented in its d attribute, which consists of a number of commands, indicating how the cursor moves. If the nodes of a path are changed, then the corresponding Wave operation is to replace the entire contents of the d attribute. If multiple users try to change the same object at the same time, then the server has no way of resolving the conflict, unless it runs a diff on the attribute value, and even then, it might not be reliable.
One way to fix this is to change the document model: instead of using a single attribute to store the path, we could use sub-elements to represent the nodes. This would allow individual nodes to be modified independently, as well as inserting and deleting nodes without conflict.
Another issue is that in SVG, the path data for an object is relative to the document. That means that if a user moves an object, then every node gets changed, and so if another user is modifying an individual node, then the modifications will conflict.
This can be fixed by disallowing users from modifying an object's nodes while the object is being moved (and vice versa); this is probably a rare-enough occurrence that users would not notice it. Another option is to specify a "position" for the path, and have the node positions relative to the position. (In fact, SVG does allow for transformations, to change the coordinate system for objects, so we could enforce that each object gets its own coordinate system.)
Summary
Now I should clarify something: if you slapped Wave on top of SVG, then you would still get a system where every user's copy of the document is synchronized, and all editing conflicts would be resolved. However, the conflicts might not be resolved in a way that makes sense for the users.
In general, there are two options for resolving these issues: change the document model, or change the operations. One option may be more appropriate than the other in different circumstances.
I should also add that even if you don't use SVG within the Wave document, doesn't mean that you can't base your editor on SVG — depending on how you have modified the document model, it should be possible to translate between the two formats.
So how would I do collaborative drawing? Well, maybe that will be a topic for a future blog post.
