At work, I've been implementing a data structure to make our collaborative editor run quickly. As part of that work, I've had to write a couple of complex functions (a couple 200+ line functions), which got me thinking about comments, readability, and presentation.
If you've never heard of literate programming, it's an idea introduced by Knuth (surely you've heard of him) that combines programming with documentation intended for human consumption. The program is presented in a document written for people to read, and transformed by a program into something a computer can execute. (The Wikipedia article on literate programming gives a decent description.)
I've dabbled a bit with literate programming in the past. In fact, I'm the maintainer for the noweb package in Debian. One of my (very) long-term projects is to build a free data structure library written for people to learn how the data structures work, and I've started implementing a couple simple data structures in literate programming style. However, looking at literate programming again, it seems to me that it has a few deep limitations.
First of all, if you want to describe something in depth, you're forcing everyone to read it, even if they aren't interested. For example, in the wc example, “#include <stdio.h>” takes 3 lines, even though anyone who has read an introductory C programming book will know immediately why that's there. On the other hand, you might want to include that for beginner programmers. One of the frustrating things I found when writing research papers was that I often had to go into too much detail, to make sure that every single step was covered, which I felt sometimes turned a short, simple proof into something unwieldy. What I would have liked to do was something like Leslie Lamport's (of LATEX fame) hierarchical proofs (though it doesn't translate well to printed text, and needs a more dynamic medium like a web page).
This limitation is partially due to the time that literate programming was conceived. With printed text, either you write something and everyone sees it (even if they just skim it, it's still there for them to see), or you omit it and nobody sees it. With something like a web page, however, you don't have this limitation. You can write “#include <stdio.h>”, and hide the descriptive text unless the reader wants to learn more.
Another limitation that I find with literate programming is that one of its underlying implications is that code is a lesser way of communicating between people, and that people communicate best using natural language. Each code chunk is intended to be described in words. While natural language is the best tool for general human communication, a small chunk of well-written code, like well-written mathematical notation, can be very effective in communicating certain ideas. Literate programming would encourage you to write the chunk twice, once is code and once in natural language, even if the code is a sufficient (or even sometimes better) way of communicating the idea. Going back to the stdio.h example, just writing “#include <stdio.h> // we send formatted output to stdout and stderr” would be a sufficient description for most programmers.
Related to this, literate programming pulls code chunks out of context, which sometimes is an important part in understanding how the code works. Seeing the code in context gives clues about what state the computer is in before it is executed, and what is expected after it executes. Of course you can always describe that in text, but seeing the code in context sometimes gives experienced programmers a more intuitive feel for how the code works.
One thing that I like about literate programming, though, is that emphasizes understanding over a line-by-line presentation. For example if you have two chunks of code that operate on the same data (say one reads and the other writes), or if you have two chunks that have operate similarly, then you would write those together, instead of having them spread out according to how the computer would execute them. It also allows you to deal with more important or interesting parts first, and leave the more mundane parts for later (I would have put “#include <stdio.h>” near the end of the document).
It is also useful to have at your disposal some of the document-writing tools, such as sectioning, lists, mathematical equations, and beautifully formatted text (and not having to make sure that your lines are wrapped properly).
While I think that literate programming is a great idea for presenting code in an understandable manner, I think that it has a lot of room for improvement, especially if we can take advantage of some of the features of the web. I'm doing some experimentation, and I hope to have some positive results.
