April 22, 2013

Thoughts on literate programming

12:50 -0400

At work, I've been implementing a data structure to make our collaborative editor run quickly. As part of that work, I've had to write a couple of complex functions (a couple 200+ line functions), which got me thinking about comments, readability, and presentation.

If you've never heard of literate programming, it's an idea introduced by Knuth (surely you've heard of him) that combines programming with documentation intended for human consumption. The program is presented in a document written for people to read, and transformed by a program into something a computer can execute. (The Wikipedia article on literate programming gives a decent description.)

I've dabbled a bit with literate programming in the past. In fact, I'm the maintainer for the noweb package in Debian. One of my (very) long-term projects is to build a free data structure library written for people to learn how the data structures work, and I've started implementing a couple simple data structures in literate programming style. However, looking at literate programming again, it seems to me that it has a few deep limitations.

First of all, if you want to describe something in depth, you're forcing everyone to read it, even if they aren't interested. For example, in the wc example, “#include <stdio.h>” takes 3 lines, even though anyone who has read an introductory C programming book will know immediately why that's there. On the other hand, you might want to include that for beginner programmers. One of the frustrating things I found when writing research papers was that I often had to go into too much detail, to make sure that every single step was covered, which I felt sometimes turned a short, simple proof into something unwieldy. What I would have liked to do was something like Leslie Lamport's (of LATEX fame) hierarchical proofs (though it doesn't translate well to printed text, and needs a more dynamic medium like a web page).

This limitation is partially due to the time that literate programming was conceived. With printed text, either you write something and everyone sees it (even if they just skim it, it's still there for them to see), or you omit it and nobody sees it. With something like a web page, however, you don't have this limitation. You can write “#include <stdio.h>”, and hide the descriptive text unless the reader wants to learn more.

Another limitation that I find with literate programming is that one of its underlying implications is that code is a lesser way of communicating between people, and that people communicate best using natural language. Each code chunk is intended to be described in words. While natural language is the best tool for general human communication, a small chunk of well-written code, like well-written mathematical notation, can be very effective in communicating certain ideas. Literate programming would encourage you to write the chunk twice, once is code and once in natural language, even if the code is a sufficient (or even sometimes better) way of communicating the idea. Going back to the stdio.h example, just writing “#include <stdio.h> // we send formatted output to stdout and stderr” would be a sufficient description for most programmers.

Related to this, literate programming pulls code chunks out of context, which sometimes is an important part in understanding how the code works. Seeing the code in context gives clues about what state the computer is in before it is executed, and what is expected after it executes. Of course you can always describe that in text, but seeing the code in context sometimes gives experienced programmers a more intuitive feel for how the code works.

One thing that I like about literate programming, though, is that emphasizes understanding over a line-by-line presentation. For example if you have two chunks of code that operate on the same data (say one reads and the other writes), or if you have two chunks that have operate similarly, then you would write those together, instead of having them spread out according to how the computer would execute them. It also allows you to deal with more important or interesting parts first, and leave the more mundane parts for later (I would have put “#include <stdio.h>” near the end of the document).

It is also useful to have at your disposal some of the document-writing tools, such as sectioning, lists, mathematical equations, and beautifully formatted text (and not having to make sure that your lines are wrapped properly).

While I think that literate programming is a great idea for presenting code in an understandable manner, I think that it has a lot of room for improvement, especially if we can take advantage of some of the features of the web. I'm doing some experimentation, and I hope to have some positive results.

0 Comments
April 1, 2013

Useless metrics

15:47 -0400

Just for fun, I decided to run David A. Wheeler's SLOCCount on my current work project. Here is the output (with the default options, slightly cleaned up):

SLOC	Directory	SLOC-by-Language (Sorted)
10656   mleditor        js=10656
2299    util            js=2299

Totals grouped by language (dominant language first):
js:           12955 (100.00%)

Total Physical Source Lines of Code (SLOC)                = 12,955
Development Effort Estimate, Person-Years (Person-Months) = 2.95 (35.34)
 (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05))
Schedule Estimate, Years (Months)                         = 0.81 (9.69)
 (Basic COCOMO model, Months = 2.5 * (person-months**0.38))
Estimated Average Number of Developers (Effort/Schedule)  = 3.65
Total Estimated Cost to Develop                           = $ 397,833
 (average salary = $56,286/year, overhead = 2.40).
SLOCCount, Copyright (C) 2001-2004 David A. Wheeler
SLOCCount is Open Source Software/Free Software, licensed under the GNU GPL.
SLOCCount comes with ABSOLUTELY NO WARRANTY, and you are welcome to
redistribute it under certain conditions as specified by the GNU GPL license;
see the documentation for details.
Please credit this data as "generated using David A. Wheeler's 'SLOCCount'."

Note: This includes some, but not all, unit tests. I had to modify SLOCCount to support JavaScript — I just used the C parser.

I started working on the project in October, so I've spent 6 months on it. So according to the COCOMO model, I've produced almost $400,000 worth of work (at 2004 wages) in 6 months.

I think I need a raise. wink emoticon

(P.S. If you're lucky enough, you'll get the Bill Gates quote in the random quote section on the right-hand side of this page.)

0 Comments
February 16, 2013

Wave, drawing, and what not to do

22:36 -0500

A few months ago, I wrote a blog post about Wave, in which I said that Wave wouldn't be my first choice as a protocol for collaborative vector graphics. Here is an expansion on that statement.

Obviously, when designing something, you want to avoid reinventing things that you don't need to. The Wave protocol operates on documents that have a similar model to XML, or at least the most common parts of XML. SVG is an XML-based format for vector graphics. So a temptation would be to slap Wave on top of SVG to do collaborative drawing. Here are some reasons why that wouldn't be the best idea.

Note that we will be taking a simplified view of SVG, so some of my statements may not be completely accurate, if you want to nit pick. However, the ideas behind the statements should still be valid.

Locking

First of all, I've always thought that a collaborative drawing protocol should include some sort of locking; it would probably be confusing if two users tried to drag the same object at the same time. So that eliminates the possibility of using a stock Wave implementation, but it shouldn't be too hard to add locking on top of the Wave protocol.

Rendering order

In SVG, objects are rendered based (more or less) on their order in the document tree. That is, objects that appear earlier in the document are rendered first. Now consider what happens when someone tries to change the order of the objects (for example, moving an object to the back or to the front). The only way to do this with the Wave protocol is to delete the object from its current position in the document, and re-insert it in its new position.

However, if another user is modifying the object at the same time, since the Wave server has no way of knowing that the deletion and insertion represent the same object, when the server resolves the conflict, the unmodified object will be re-inserted, losing the second user's changes. In addition, if two users try to change the rendering order of the same object at the same time, the object could get re-inserted twice.

We have a similar issue with object grouping, but we will only look at object ordering.

How do we fix this? One way might be to change the document model: we could use an attribute to store the object ordering, rather than using the document ordering. If we try to just use a simple integer sequence (that is, 0, 1, 2,…) for the object ordering attribute, then changing object ordering may result in most of these attributes changing, and so if multiple users try to change ordering at the same time, the server (if it is naive) may not be able to resolve the conflicts, while still maintaining the property that the attributes are a simple integer sequence. We could try using decimal numbers (e.g. to move an object between the objects ordered 3 and 4, we give the object an order of 3.5), but then we may get the ugliness of extremely long decimal numbers. This could be solved by having a watcher that periodically renumbers the objects. As long as it doesn't try to renumber the objects while a user is also reordering the objects.

Another way is to change the protocol: add an operation for reordering the objects. Of course, adding operations means that we need to do more work figuring out how it fits in with the others. So the fewer operations that we need to add, the better. For object ordering, we can probably get by with just one operation, specifying an object, and a delta in its rendering order.

Another option is to just say that these types of conflicts should happen rarely enough that we don't care about them, and just use Wave and SVG unchanged. This is certainly a valid option, as long as the users are prepared for this (or as long as you are prepared to deal with the users). It can be argued that textual documents have a similar issue when users move text from one area to another, but it is probably less of an issue with textual documents since not all editors include a "move" operation, and even if it is included, it not commonly used. Instead, users usually "copy-and-paste", which arguably makes this type of conflict less confusing.

Object nodes

Now consider the actual description of an object. Let's just look at the <path> element. The nodes of a path are represented in its d attribute, which consists of a number of commands, indicating how the cursor moves. If the nodes of a path are changed, then the corresponding Wave operation is to replace the entire contents of the d attribute. If multiple users try to change the same object at the same time, then the server has no way of resolving the conflict, unless it runs a diff on the attribute value, and even then, it might not be reliable.

One way to fix this is to change the document model: instead of using a single attribute to store the path, we could use sub-elements to represent the nodes. This would allow individual nodes to be modified independently, as well as inserting and deleting nodes without conflict.

Another issue is that in SVG, the path data for an object is relative to the document. That means that if a user moves an object, then every node gets changed, and so if another user is modifying an individual node, then the modifications will conflict.

This can be fixed by disallowing users from modifying an object's nodes while the object is being moved (and vice versa); this is probably a rare-enough occurrence that users would not notice it. Another option is to specify a "position" for the path, and have the node positions relative to the position. (In fact, SVG does allow for transformations, to change the coordinate system for objects, so we could enforce that each object gets its own coordinate system.)

Summary

Now I should clarify something: if you slapped Wave on top of SVG, then you would still get a system where every user's copy of the document is synchronized, and all editing conflicts would be resolved. However, the conflicts might not be resolved in a way that makes sense for the users.

In general, there are two options for resolving these issues: change the document model, or change the operations. One option may be more appropriate than the other in different circumstances.

I should also add that even if you don't use SVG within the Wave document, doesn't mean that you can't base your editor on SVG — depending on how you have modified the document model, it should be possible to translate between the two formats.

So how would I do collaborative drawing? Well, maybe that will be a topic for a future blog post.

0 Comments
February 13, 2013
19:55 -0500
Hubert Chathi: # Development Best Practices slide deck from @ca.linkedin.com/in/jfilip/
0 Comments
February 11, 2013
10:58 -0500
Hubert Chathi: My # identity provider plugin for # has been approved
0 Comments
November 22, 2012

Collaboration, Operational Transformation, Wave

15:37 -0500

At work, we are building a real-time collaborative editor. If you've ever used Google Docs with multiple people working on the same document at the same time, that's the sort of thing we're trying to do. I don't think I'm being too bold in saying that real-time communication and collaboration will soon go from "killer feature" to "feature that people will assume that you have and are frustrated if you don't". Like WYSIWYG for word processors.

The major issue with collaborative editing is synchronization: making sure that everyone sees the same thing. In thinking about synchronization, it is important to not just consider whether everyone's copy of the document is the same, but also that the document makes sense. For example, a text-based protocol is not suitable for XML-like data, and XML is a bad way of storing text formatting. Consider two users editing: "The quick brown fox jumped over the lazy dog". One user makes "quick brown" bold, and another user makes "brown fox" italics. Using a naive XML method, you would get "The <b>quick <i>brown</b> fox</i> ...", which is invalid XML.

For collaborating on textual documents, the Wave Protocol is certainly appropriate, but it isn't appropriate for all things. For example, it wouldn't be my first choice to use for vector graphics (consider: how do you move an object forward or backwards in the drawing stack?). Even tables can cause problems unless your server understands them, has some way of cleaning them up, or you come up with a clever way of representing them. Say we have a 2x2 table:

<table>
  <tr>
    <td></td><td></td>
  </tr>
  <tr>
    <td></td><td></td>
  </tr>
</table>

One user adds a row (which adds another <tr><td></td><td></td></tr> at the end), while another user adds a column (which adds a <td></td> to each <tr>). If both edits happen at the same time, the result will be:

<table>
  <tr>
    <td></td><td></td><td></td>
  </tr>
  <tr>
    <td></td><td></td><td></td>
  </tr>
  <tr>
    <td></td><td></td>
  </tr>
</table>

That is, the first two rows will have three columns, and the third row will have two columns. There are several ways of dealing with this, but you need to know about the potential problem before you can address it.

The moral is: make sure that your synchronization method is appropriate for your document types, and/or make sure that your server understands your document type enough to make sane conflict resolution decisions.

0 Comments
August 8, 2012
16:23 -0400
Hubert Chathi: Sometimes I wonder if # developers purposely invent nonsensical ways of doing things
0 Comments
July 11, 2012
11:12 -0400
Hubert Chathi: # Mobile moving to HTML5+PhoneGap
0 Comments
July 5, 2012
14:53 -0400
Hubert Chathi: Interesting new question type for # in #
0 Comments
June 13, 2012
23:28 -0400
Hubert Chathi: # is a great framework, except for its braindead handling of POST data. #
0 Comments
June 1, 2012
13:04 -0400
Hubert Chathi: It looks like my # # identity provider plugin made it onto @remote-learner.net 's approved plugins page
0 Comments
May 30, 2012
18:48 -0400
Hubert Chathi: just watched the recording for @tomazlasic.net 's # keynote. Great stuff. # #
0 Comments

iMoot presentation slides online

14:52 -0400

I've put up my slides for my iMoot talk.

For the iMoot 2012 Moodle conference in May, I gave a talk entitled Integrating Moodle With an External Tool. It was primarily a technical presentation, in which I present various issues to consider when creating an integration, and different options for addressing those issues. It was mostly based on my most recent experience integrating Moodle with MuchLearning, as well as some of the work that I did at Remote-Learner.

0 Comments
May 29, 2012
21:37 -0400
Hubert Chathi: # over. Thanks to @twitter.com/ikawhero and the rest of the team at @Pukunui.com. # #
0 Comments
May 26, 2012
12:25 -0400
Hubert Chathi: Just finished my first session of my # talk on "Integrating # With an External Tool". Repeat session on Tuesday, 3:30 EDT. (-0400) #
0 Comments
May 9, 2012
12:36 -0400
Hubert Chathi: will be preseting at this year's # # about integrating with external tools #
0 Comments
May 2, 2012
15:44 -0400
Hubert Chathi: parsing # is complicated
0 Comments