A significant portion of my day job is the publishing side of Onyx Neon. We have invested in a toolchain which takes manuscripts written in PseudoPod and produces XHTML, PDF, ePub, and print-ready documents. (We're happy to build on work which has come before, such as LaTeX and POD and parsers and so forth.)
The existing PseudoPod formatters had their flaws, though (because we hadn't pushed them hard enough to admit to ourselves that everything is a compiler). In a small business like ours, the best thing to do now is often the simple and easy fix—if you're careful that you don't delay doing the right thing for too long.
Doing things the right way is much easier now that I've improved our tools to create a real document object model which is traversable correctly.
The secret is twofold:
- Think really hard about the problem you're trying to solve, especially the edge cases which are neither obvious nor easy.
- Reuse existing tests as much as possible.
The latter point is far subtler than it seems. Many, many Internet discussions debate endlessly the pros and cons of test-driven design. Many, many people make the point that unit tests can be fragile and cumbersome and may not provide the most practical benefits we'd all like to get. (Every debate degenerates to this, as if anyone seriously argued that highly specific unit tests were the prime goal of test driven design.)
I am fortunate that writing a document formatter and transliterator has a well-defined input and a well-defined output. (I suspect that many programs have such mappings.) I have an input document with all of the features the translator should support, and I know what kind of output they should produce.
Reimplementing this formatter was a matter of making each test file pass, a few assertions at a time.
With that said, testing a few hundred assertions in less than a dozen files is a relatively small job—perhaps a few hundred lines of code. Yet I believe the principle applies, especially if you have well-factored and well-tested components.
You can see the same principle at work in Ward Cunningham's Fit project. Making tests reusable and retargetable allows the possibility of reimplementation with a baseline of correctness.
I don't have specific suggestions about how to write tests that are so useful, but I've noticed that these tests as well as the tests of web interfaces on other projects have tended to converge on a model of producing careful input and examining output for very specific results. Loose coupling isn't just for code components.
(It's interesting to read Things You Should Never Do, Part I and The CADT Model in this context. Also, in spite of how much I grumble about continual rewrites of various portions of the Rakudo Perl 6 stack, I do respect that the Perl 6 testing infrastructure is of enormous benefit.)