Victims of the Success of CPAN Documentation

| 11 Comments

C. S. Lewis wrote something in the Screwtape Letters to the effect that every vice is a virtue misapplied. Doesn't that describe software development?

CPAN is valuable because of the semi-formal guidelines which have grown up around it:

  • a culture of sharing
  • a first-come, first-served namespace policy
  • a culture of testing and quality
  • good tools to produce, bundle, install, and verify distributions
  • optional but free bug reporting
  • a standard documentation format and structure

The common Perl 5 module documentation structure—a name, a synopsis with code examples, a description, and then an API listing—is good for describing a single, standalone module. You can extend it to explain the philosophy behind your code and its use, as something like Test::More does, or you can provide a very simple explanation of a very simple API, such as strict does.

Like many of the good features of Perl 5, the CPAN documentation format borrows liberally from a feature of Unix: specifically, the format of man pages.

Like many of the not-quite-perfect features of Perl 5, features borrowed heavily from Unix show their limitations. In particular, the biggest flaw of the API-heavy documentation structure is that it's only good at documenting APIs and its passable at best at describing the important relationships between components of the distribution which provide APIs.

In other words, if you want to understand DBIC and use it effectively, you must understand the relationship between result objects and resultsets as well as the separation of responsibilities between them. (DBIC deserves credit for recognizing and addressing this documentation gap by providing multiple pieces of documentation beyond API explanations, but the problem is difficult to solve.)

You can see the problem more dramatically by exploring something like HTTP::Response, which is part of a well-factored and robust distribution but which has the relevant information spread through several separate pages of documentation because the structure of the documentation reflects the inheritance structure of the code.

Again, you can't fault Gisle for factoring his code well and documenting it effectively in the dominant style of the time (and Gisle deserves much credit for helping to spread the dominant style of documentation)—but isn't it ironic, a little bit, that where objects should be well-encapsulated black boxes that their internal design decisions are so apparent in the documentation?

Some will suggest that we need better ways to explore the documentation, but I believe there's a bigger question we need to answer. When I'm flipping through all of the documentation for DBIC to figure out the best place to put validation code to ensure that my objects always get created in a known-valid state, I need something far different than a list of methods used to search for objects that already exist. I know what I want to do, but I don't know which module's documentation holds the right answer.

I don't know what the solution is.

11 Comments

I don't know what the answer is either, but that's been a minor itch that has been nagging me for years.

My (alas terminally tuit-starved) tentative to partially address the problem is Pod::Manual (http://babyl.dyndns.org/techblog/entry/pod-manual-returns), which is meant to gather the PODd of a distribution into a logical unit. I don't think it would be a perfect solution (heck, I'm not even convinced that it would be a good one), but it could help into ordering the different PODs in a logical reading order, and remove duplication that makes sense in independent manual pages, but not so much when put together.

In all cases, with Dist::Zilla and Pod::Weaver, we suddenly have much more freedom and possibilities when come the time to munge documentation prior to releases. All the tools are there to create a good solution. We just have to find out what it might be. :-)

The solution is to have both reference (or API) documentation, AND a separate Manual or Tutorial (or both), mixing them in the same pieces of POD just makes things confusing.

See for example Moose' growing set of Moose::Manual pages, or my DBIx::Class::Tutorial (the CPAN one is a bit old, someone smack me to get the update into a dist, is available online though). These should try and show the user how to use the module (or dist) for certain complete tasks, or with a walkthrough of a particular extended task that tries to cover as much of the usage as possible, with many links to the correct pieces of reference doc.

Ideally these should also be self-contained enough to not send the reader hunting through the reference doc every time they want to try something out. DRY is not a useful principle where docs are concerned, instead each task should contain enough info to solve it right there on the page.

The tricky part, I think, is how to help the user find this non-API doc, also whether to include it in the same dist as the module, or separately, so it can be updated as often as needed without new code releases.

Jess

I was just getting ready to respond to this, when I saw that castaway had already said pretty much exactly what I was going to(:

One thing that I'd like to add though is that the tricky part isn't helping the user find the manual, it's getting the manual written in the first place. The skills required for writing a halfway-decent API reference are not hard, and most people who are writing a module can manage to come up with a written description of the module to tell other people what exactly it does. Writing a good user's guide is a very different sort of task, one that typically requires a decent amount of experience in technical writing to come up with something that's actually useful to new users.

I think an important goal along the way to improving the state of documentation on CPAN is to encourage participation from people with that skillset a lot more, even if they rarely contribute code.

DRY is definitely counterproductive for the documentation.Thanks for the succinct explanation of what I tried to say.

This is a problem I've struggled with many times, but for obviously reasons most difficultly with Dist::Zilla. It is a huge pile of little pieces that have lots of possible emergent behaviors. Even were I to document each pieces thoroughly, the system would not be clear.

This is what led to having both the per-plugin documentation and the http://dzil.org tutorial. Even that isn't entirely satisfying, but better than nothing.

It's a frustrating situation for everyone. New users of things like Dist::Zilla or Email::Sender complain that there are "no docs," but writing the right kind of documentation is exceedingly difficult, especially because it's not clear how to most effectively package it for consumption by the end user. If "perldoc" were more like (gasp!) a texinfo browser, it would be easier, because steps could be cross-referenced more easily. Instead, we must treat most L> tags as optional, or at least as troublesome-to-follow links if we assume that users are likely reading in their console.

Tough problem.

Might I suggest adding a more emphasis on "Cookbook" style POD contents (I think Moose went that route). "How to do X with your distribution" is what users most often care about.

Cookbooks can be nice, but I get quite frustrated with ours (the DBIC one), because its a very varied pile of bits and swallows info that would be better off in a more structured doc. I guess its the non-tech-doc-writers solution.. Better than none, at least.

An idea that occurred to me, is it would be nice to have something Pod::Weaver-based, that isn't restricted by the current 1-Pod-Page == 1-Perl-Module connection. That very trait alone is pretty much one of the ways that internal design decisions are exposed in the documentation, and its not very helpful.

What would be more helpful, is if entire distributions could have their documentation scraped from the various .pm's and aggregated into a single, well indexed, hefty documentation reference for the whole dist.

This would eliminate the current behaviour I seem to enact of clicking around the documentation on CPAN trying to find which one it is, and/or opening multiple tabs just so I can flip between 2 parts of the documentation, and would instead mean its just a scroll away, and then I only open tabs as needed, instead of by demand

Something else I've been kicking around as a "Nice to have" would be a system for packages to somehow advertise their job is part of another ( say for example, a Dist::Zilla plugin can advertise its part of Dist::Zilla ) and advertise they have some metadata that the parent can consume.

This could be then parsed, aggregated and soforth and form a proper documentation source for this distribution, and showing, in the content, other modules one can use to do their work, instead of requring people to know they're looking for something, type in the Foo::Bar:: prefix into CPAN, and then scrolling down the list of results and seeing if there's something you like.

That way, there is a more logical path for people to find modules related to what they're already doing by sheer accident, and they're more likely to stumble into not-yet-recognised awesomeness, and use it, and that would be a GoodThing™ in my book.

This is a problem for all API developers, not just CPAN developers. You don't hear from the customers who are happy with just the API docs, because they found what they needed. Users have MANY diverse needs, and the kinds of documentation they look for are different depending on how that user learns. For the APIs I develop, I end up doing all of the following:

* Document the API at a module and subroutine level.
* Create a "Usage" document, showing general uses and how it relates to other modules.
* Create a "Tutorial" document, showing basic uses with longer explainations.
* Create a "Cheat sheet" document, with an example call(s) to each subroutine/method in the API with a short comment on each
* Create an "Overview" document, explaining the high level purpose of the API and its uses
* Create a presentation/class, that goes in to excruciating detail of how each part of the API works.

Yeah. It's a lot. It's the same thing in different ways, and some might consider it redundant. However, that's how the human brain works: it needs to see the same information in several different ways in order to build a model of it in their mind. What users search for and what users want to look at *first* is different between developers. To make it easiest for people to learn, the more different ways of presenting the same information you can make, the better.

CPAN has been really good at standardizing a lot of things. Maybe in a future version of the meta protocol we could add a "Tutorial" doc and an "Overview" doc as part of the standard documents available, and instead of having the API docs be the first thing that is brought up, have the Overview doc be that one presented, immediately with links to the API.

Although POD is not perfect, in my view the Perl community is among the ones with better documentation standards.

There are a number of reasons for this. One is CPAN itself, as you say you publish in a context that has a culture, like testing, and your module is expected to follow some common practices. Also the interpreter itself comes with tons of documentation, which is maintained by perl5-porters themselves. So there's inertia coming from the very core developers. Compare that documentation with what you get in almost any other open source project, perl's docs are awesome.

From my perspective another reason is POD. It is simple, it works, and allows you to write introductory stuff, like motivations, descriptions, etc. Compare that with what Javadoc or RDoc encourage: they are so targeted to APIs that what you get is typically something very DRY that works as a reference (if it has enough coverage), but it is rare that it serves for anything else, like learning how to use the darn thing (except for the simplest libraries).

The tools themselves could produce something else, but in practice generally speaking you get references. POD-style has in practice a balance between explaining how to use the stuff to users, and providing then an API reference.

To scale POD you need to give link points here and there, your main page may be kind of a guide to the rest of the docs. I learnt Moose perfectly with its documentation, which I think it is very good.

POD may be surely improved somehow, but it is in my opinion an excellent balance between simplicity and usefulness.

Many good ideas in the other comments. One thing not mentioned is transclusion. This would solve all or most of the "what docs go where" problem that HTTP::Response suffers from. Extend the POD syntax with something like T<$r-E<gt>header|HTTP::Headers/header> to pull the relevant part of the another POD in.

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide

Categories

Pages

About this Entry

This page contains a single entry by chromatic published on February 16, 2011 12:01 PM.

Want Better Perl 6 Sooner? Write Rakudo Benchmarks was the previous entry in this blog.

Technical Knowledge Doesn't Age Like Fine Wine is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.


Powered by the Perl programming language

what is programming?