May 2009 Archives

Why (Regular) Releases Matter

| 1 Comment

If you read the Rakudo #17 release announcement carefully, you'll see a curious note:

Due to the continued rapid pace of Rakudo development and the frequent addition of new Perl 6 features and bugfixes, we continue to recommend that people wanting to use or work with Rakudo obtain the latest source directly from the main repository at github.

Though Rakudo produces monthly releases (as does Parrot), the pace of development is so fast that Rakudo #17 (released a week ago) doesn't represent Rakudo very well as it exists today. When someone like Jonathan can close ten bugs in an eight-hour day, or a simultaneous change to Parrot and Rakudo can make 1000 spectests suddenly pass for the first time, or a two-character change to Parrot provides a 6.5% runtime performance improvement, a month-old Rakudo release seems ancient in comparison.

Adam Kennedy asked a valid question, however. "Given the recommendation that interested parties track your repository, why bother making releases?"

There are several reasons.

Helping Users

Users benefit from regular releases.

If they want to use the software -- take advantage of new features or bugfixes -- they can do so without having to manage checkouts. Though the project may keep its trunk as stable as possible, it's not always perfect. Trunk represents a work in progress. A release is a stable point the developers believe will work well for regular users.

An official release is much easier for distributions and packagers. If users can get binary releases on a regular basis, they don't even have to know how to configure and build the software to take advantage of new features or bugfixes.

A well-managed official release is easier to upgrade to if the delta between releases is small. I recently helped migrate a server from an ancient Red Hat 7 installation to a modern Ubuntu version. We had to recreate almost everything; little besides user data copied over directly. Now we can keep the server up to date with two simple commands once a week. Sometimes the changes are greater than others, but for the most part, it's transparent and easy.

Helping Developers

A release gets wider distribution and testing than a snapshot or trunk (in the same way that trunk gets more testing than branches). Bug reports, IRC and mailing list traffic, and even commits increase in the days leading up to and following a release.

Regular releases provide an incentive to keep quality high. Packaging, testing on exotic platforms, and documentation tend to suffer first when there's too much to do. If your release process requires you to perform some routine work, you'll do it when the release is near. (Of course if you have irregular releases, this busywork can seem overwhelming -- which is why some projects slip their releases continually.)

Regular releases require you to keep your trunk stable and releasable. Not only does this help you produce the next release on time, but it prevents some of the madcap scrambling that can occur if you need to make an emergency release. Hopefully you never have to do this, but the confidence that comes from knowing that a dozen people could produce a release with an hour or two of notice is electrifying.

Regular releases also require you to work in small pieces. Large branch merges can destabilize the trunk for days or weeks. If you can't avoid them altogether, there's a natural time for them to merge: just after a release. Branches that haven't seen activity in a couple of release cycles seem old and crufty. (There's a reason they haven't merged.)

A regular release cycle produces a measurable cadence. You can graph the commit rate to various parts of Parrot on a calendar. Larger or deeper changes occur just after a release -- there's pent-up demand, and much of Parrot's major progress has occurred in this timeframe. Smaller features merge in the next couple of weeks. The test suite and documentation get reviewed a few days before a release.

Regular releases expose problems in the project. Perhaps your test server is inadequate and you can't get results when you need them. Perhaps one section of the code depends on the attention of someone who has no free time lately. Perhaps only one person has access to your web site to update the notice. You will discover these problems -- and have a chance to fix them before they're catastrophic.

Helping The Project

A project that never releases software looks dead. That may not be true, but perception is important.

Regular releases -- especially on a predictable schedule -- demonstrate that your project takes its reputation seriously. (Please don't assume that I'm implying the converse. You give your users the option to use the best code you have produced at regular intervals. They don't have to upgrade, but they have the option.

Regular releases demonstrate that you care about your project. (Again, please read a disclaimer about the converse here.)

Regular releases keep your project in the news. As boring as releases should be, they're still newsworthy. What's changed? What new feature is available? What's better?

Regular releases attract new users and developers. Our regular committer count has jumped since we started performing regular releases. Our development pace has quickened. We turned around a floundering project and injected new life into it.

People still have doubts that we can achieve our audacious goals, but every month that we release a new stable version of our software that's better than every previous month, we demonstrate that we know how to produce modern software.

A recent discussion on the p5p mailing list seemed like an opportunity for me to suggest what could help produce regular Perl 5 releases. (Nicholas Clark nearly dared me. He's a bad man.)

As you might expect, the resulting discussion went off in several directions. A few people see little value in regular releases (which means that I should write about that soon). Perhaps the most important point is a question that a couple of pumpkings alluded to. David Golden, ever perceptive, picked up on the real question.

How do you know that Perl 5 is "stable" and ready for release?

Double Lives Take Half as Long

(I addressed this issue from the other side in February in Hanging the Core out to DRY. This is a followup from the "Why does so much time elapse between stable Perl 5 releases?" department.)

One of the biggest time sinks to releasing a new stable version of Perl is chasing down dual-lived core modules. A dual-lived core module is a module in Perl's core library that someone other than p5p maintains outside of the Perl 5 core repository. At various times, someone notices that there's a new release of a core module (usually the maintainer, but...) and suggests that a pumpking merge the new version into the core.

Why are there dual-lived modules?

The most important reason is so that users can upgrade to newer versions of core modules without upgrading the core. Why would they want to do that? Perhaps it's difficult to upgrade core Perl and they don't have permission. Most likely, it's because Perl 5 releases are infrequent. Where a bugfix in a core module may require only a little bit of testing and represent only a little bit of change, releasing a new version of Perl requires a lot of testing and coordination.

Another reason for dual-lived modules is to spread the maintenance burden. While only a few people had commit access to the Perl 5 repository in the past, every maintainer of a dual-lived module can have his or her own repository just for that module. This is less of an issue now that Perl 5 uses git, but old design decisions tend to persist.

A final reason for dual-lived modules is that the modules themselves may be usable on several versions of Perl, even if they're not binary compatible. (The modules may be pure Perl, or their XS may be source compatible with multiple Perl major versions.) If code works correctly and unmodified on Perl 5.6.2 and Perl 5.8.6 - 5.8.9 and Perl 5.10, is there a reason to limit users to waiting for Perl 5.10.1 or 5.12 to use the new version?

Of course, coordinating separate releases from all of these separate repositories and separate authors on a well-known and well-understood time frame and testing them to make sure they work with the core language at that point as well as with each other -- and remember there are dozens (if not more) dual-lived modules spread out among dozens of authors -- is difficult.

As I'm sure you realize by now, this is a tangle.

Cutting the Knot

These separate goals of dual-lived modules are incompatible.

The point of having a standard library full of modules is so that Perl is useful to users by itself, with nothing else installed. This theory suggests that users shouldn't have to install CPAN modules to do useful things.

The point of releasing modules on the CPAN is so that users can upgrade them independently of the core. The CPAN is an integral part of the modern Perl programming experience. (When users complain that they have to install "half of CPAN" to get a modern Perl application to run, part of that complaint is that Perl 5 the language is too minimalistic and flexible and not opinionated in several places to write modern Perl applications on its own. Another part of that complaint is that we don't have CPAN quite right yet.)

I can think of three solutions. They can overlap, but they're also independent.

First, improve the core's automated testing. This helps everyone; it can identify changes in the core code that affect the stability and behavior of the standard library. It can also identify changes in standard library modules which do not work on important platforms.

The sooner you can discover a failure, the easier it is to identify the offending change.

Second, do not automatically merge in all upstream changes to the core. A stable and well-tested version of a dual-lived module is better than the most recent version, at least if you haven't verified it on all interesting platforms. If users want a newer version, they can upgrade themselves. (That's one of the reasons dual-lived modules exist!)

I realize this seems to conflict with my belief that users should upgrade frequently. I still believe they should. I still believe strongly in frequent stable releases -- but users do not have to upgrade. They should have the option to upgrade frequently (and they should know the support and deprecation implications of not upgrading), but they have the choice.

Third, remove dual-lived modules from the core. Maintain and distribute only those modules necessary to install other modules.

This will expose several other problems, however: encouraging Perl distributions to produce stable bundles which represent a usable core, improving configuration and installation support, and overcoming the social inertia toward an unchanging Perl core.

This is probably a historical inevitability, however; the maintenance burden of the Perl 5 core cannot increase indefinitely. Perl 6 has chosen option three to avoid this maintenance burden. (Many design and management decisions in Perl 6 address Perl 5's drawbacks.)

Perl 5 and Binary Compatibility

| 1 Comment

One of the issues under consideration in the Perl 5 support policy is binary compatibility.

Binary compatibility is the likelihood that binaries compiled against a previous release will work with a newer release. These binaries are most likely modules with XS components, though they can also be programs which embed Perl. This is very different from source compatibility (Unix's traditional compatibility guarantee), where the syntax of a program doesn't change in incompatible ways. You may not be able to take advantage of new features, but old features continue to work as you expect.

An XS of Pain

If you've used the CPAN much, you know that XS modules can be more difficult to configure, build, and install than pure Perl modules. (If you haven't used the CPAN much, know that there's a convention of providing pure Perl versions of certain XS modules. They may be slower and less efficient, but they can be easier to install and debug.)

Why is XS so troublesome?

Windows and Mac OS X users have noticed that installing XS modules requires a working development environment, including the Perl headers, a decent compiler, and a passable make utility. (To be fair, even Unix users can have trouble, especially those on platforms with horrible C compiler support. The C99 standard is a decade old. If you're not busy, would you mind implementing some of its features? Thanks!) Strawberry Perl is a great Perl distribution which includes a preconfigured development environment suitable for building XS modules.

Part of the problem is that XS is difficult to use correctly. The Perl 5 core has far too little encapsulation; XS exposes many intimate details of its internals. This allows a lot of power, but it has other implications which I'll discuss in a future entry. The problem here relates to binary compatibility.

Duck Sequencing

The silly example of duck typing suggests that anything that looks like a duck, walks like a duck, and quacks like a duck is obviously a duck. That's fine in a lot of languages. It's not (usually) fine in C (though like most things in C, you can fake it if you're exceedingly clever and disciplined and very good at lying to yourself).

Suppose you have a Duck struct:

struct Duck
    quack_func_t *quack;
    walk_func_t  *walk;
    unsigned int  num_ducklings;
    bool          has_feathers;

If you don't know C, that's fine. Think of this like a hash or a dictionary where the order of the keys really, really matters and the size of the values really, really matters. In C terms, this describes a blob of memory. The compiler carves out some 16 bytes of memory (four bytes per pointer, four bytes for an unsigned integer, and four bytes for the boolean -- wasting plenty of bits to make the struct size a power of two). Anytime in the source code you refer to a Duck, the compiler knows that the first four bytes refer to the function the duck uses to quack.

At least, the compiler believes you when you tell it that. The compiled binary doesn't check. That information isn't there. All the binary knows is that it has a chunk of memory and that to quack, it grabs the first four bytes from that chunk and uses that to look up a function to call.

All is fine and good if you've typed your program appropriately. Imagine that someone comes along and says that ducks must also swim. This means that the Duck struct also needs a swim member. There are several ways to handle this. One is to put this member where it makes the most sense:

struct Duck
    quack_func_t *quack;
    swim_func_t  *swim;
    walk_func_t  *walk;
    unsigned int  num_ducklings;
    bool          has_feathers;

This version of the struct has functions at the top in alphabetical order. That's nice for maintainers; it has a well-defined structure. Unfortunately, any binary which used a Duck before is now broken until you recompile it. Remember, the binary doesn't check that the struct's layout has changed. The binary doesn't know about the struct layout. It just knows to look in a specific spot for a specific amount of memory which it can treat in a specific way.

Code that tried to walk before will now swim. If those functions have different signatures, expect a crash. Code that checked the number of ducklings will now show a very, very fertile duck when the code treats the walk pointer as an integer value.

When Ducks Cry

The right way to maintain backwards compatibility is to put the swim function at the end of the struct declaration. Code compiled against the previous Duck won't be able to swim because it doesn't know anything about that struct member, but at least it can do everything the previous Duck did...

... unless it did anything advanced with Duck structs, like constructing its own or relying on a particular layout for reflection purposes.

At this point you should be able to imagine the chaos if you want to remove a struct member.

While there are some benefits to C's compact representation of data, the drawbacks are serious. C's type system is a thin veneer over memory layout which goes away at compile time.

Another approach is to hide the details of a Duck behind an interface of functions. To create a duck, call a function. To manipulate a duck's member variable, call a function. (You can also use macros.) Adding a layer of abstraction gives you the ability to hide a duck's intimate details behind an interface that won't change as much.

This often works, but those functions can change too. As I alluded before, changing a function signature or name can make existing binaries crash too. You can keep old functions around as a compatibility layer, revising arguments and delegating to the new versions....

Less Code is Easier to Maintain

... but the more code you have, the more difficult it is to maintain.

This argument applies to the users as well as the developers of Perl 5. The cost of sorting through an API and documentation is important to consider. Pawing through lists of deprecations and backwards compatibility concerns is not free. In the Internet age, when Perl 4-style tutorials have long outlived their usefulness, it's easy to find a tutorial that explains the old and broken way to do something. If that code's still around, it's not obvious that the new way works, or what that new way might be.

Sometimes the best way to implement a new feature is to remove old code; sometimes the only way to fix a bug is to remove old code -- especially when you develop iteratively toward the optimal potential solution.

If Perl 5 had a well-defined and well-enforced boundary between perl internals and Perl extensions, p5p would have an easier time rearranging the internals to support new features, remove bugs, and to improve the code to make maintenance easier. The current situation allows extensions to poke at Perl's guts. As I've said, we can't even tell if or when this happens.

Improving Perl 5 is unnecessarily difficult -- but we need the courage and the freedom to break binary compatibility when the gains outweigh the costs of upgrading and changing. That's a calculation we perform too infrequently.

The Current Policy

The current binary compatibility policy is that all minor releases in a major release series maintain binary compatibility. 5.8.0 established a binary compatibility level in July 2002. The nine subsequent releases over the next six and a half years modified neither existing function signatures nor exposed struct layouts.

Of course, without a well-defined and regular release schedule, users can't predict when they'll have to recompile their XS extensions.

Without a well-defined extension API, XS developers may have to support multiple major versions of Perl. Without a formal end-of-life policy for Perl releases, XS developers have to make their own decisions about what they'll support -- and which APIs work where.

(XS itself is often unnecessary, but that's a different problem altogether.)

Don't misunderstand. Binary compatibility within releases in a major family is usually very good. Perl 5's reliance on the CPAN is wonderful, but reinstalling dozens of modules after upgrading to a newer minor version would be problematic. (This may argue for better CPAN distribution management.)

Yet drawing out binary compatibility for unknown periods of time -- like drawing out the lifespans of releases for unknown periods of time -- leads to unpredictability, not just for developers but for users. Reliable, constant improvement requires not just the will but the ability to make changes. Sometimes those changes are incompatible with what's come before. (If we could predict the future reliably, they wouldn't be -- but the best way we can learn is through experience.)

Allowing ourselves the ability to make changes -- and remove vestigial code that we'd otherwise have to support for indeterminate periods -- allows us to improve. Tying those periods to well-known calendar dates actually increases the predictability of our systems and processes.

Users don't have to upgrade, of course. Users can choose to stick with the best Perl 1999 had to offer. It's free. They get the same amount of support they would back then.

Yet if we can make improvements -- if we have the will to unshackle our future from our past -- we may be able to offer them something far better than we could have imagined in 1999. That's my goal, anyway.

What is "Support" Anyway?


I keep no secret of the fact that I'd like to see an annual major release of Perl. Several people disagree about my identification of limits in a proposed Perl 5 support policy. Some people believe that without an eight-to-ten year support policy, Perl is doomed to irrelevance.

I think that's nuts.

I also think plenty of the disagreement is because none of us knows what "support" really means.

I Made It; You Can Use It

The word "support" may have several implications. This list is not exhaustive, but it is exhausting. Perl 5 "support" may mean:

  • Configuration and installation assistance. This may include downloading instructions, an address to report odd bugs during configuration, troubleshooting assistance for compilation (and cross-compilation) problems, or even a local or remote consultant installing the Perl core on a machine.
  • Access to a bug tracker, to report problems and search for workarounds and similar discussions.
  • Feature requests, where users can ask for modifications and set a priority on their delivery.
  • Consulting services, whether maintaining existing programs or creating new programs.
  • Training services.
  • Patches and bugfixes, often delivered on a set schedule (whether measured from the date of reporting a security flaw or major bug or on a regular basis).
  • Upgrading assistance, especially when migrating between major versions which are not entirely backwards- or forward-compatible.
  • Binary compatibility, often as a consequence of upgrades, patches, and bugfixes.
  • Indemnification, if the project infringes on copyrights, patents, trademarks, or trade secrets of other parties.
  • Stability, in that things might not ever improve, but they won't change.

I may have missed some categories; please feel free to add to this list in the comments.

If it Breaks, You Get Both Pieces

How many of these categories of support does p5p provide right now?

Patches and bugfixes seem obvious, but there are limits. As everyone knows, there is no set schedule for Perl releases. If you reported a bug in Perl 5.10-on 19 December 2007 and someone fixed it that day, your only option is to maintain a patched version of Perl for the past 17 months. (It's nice that you have that option, but 17 months is a while.)

There are several venues to get configuration, installation, and usage support. The voluminous core test suite is a reasonably cheap and easy way to provide this level of support. It's not free, but it's improved p5p's ability to ensure that Perl behaves appropriately on as many platforms as possible.

Training and consulting services are outside of p5p's mission. Some developers work for companies as internal consultants. Others are independent trainers. No single entity supports Perl to this degree.

Perl 5 has a public bugtracker, and the p5p mailing list serves as a place to request (and recommend (and occasionally submit patches for)) new features. Again, the lack of a formal business entity to act as a clearinghouse for these types of support may not please some businesses, but this type of support does exist.

Indemnification is difficult, at best. The Perl Foundation may provide some degree of assistance -- it manages the copyright on Perl 5, for example -- but like most other community-developed FLOSS projects, there's little money to take on legal cases for users.

Binary compatibility exists as a convention. There's no written support policy, just a rough agreement that minor releases (the 5.8.x series, for example) maintain binary compatibility. An XS extension compiled for Perl 5.8.1 ought to behave the same way with Perl 5.8.9.

Stability is an illusion, thanks to the DarkPAN problem. John Napiorkowski's Darkpan => CPAN Service? suggests an intriguing business idea which could alleviate this problem by aggregating DarkPAN code. (There's modest income in making CPAN -- and the CPAN Testers Service -- available to businesses.)

Setting aside the DarkPAN, there's no official specification for Perl 5. There's a test suite and the core documentation and tens of thousands of modules on the CPAN -- but even fixing a bug has the potential to break existing code. No specification covers every edge case. No test suite exercises all potential paths and uses. No platform nor problem is perfectly predictable.

Thus p5p provides very few of these categories of support now. Why then is there such resistence to proposed changes?

When Support Means Never Having to Maintain Your Software

I suspect that the "p5p should support all Perl releases for at least five years or no one will ever use Perl again in business!" brigade really means that Perl is plumbing. It's a series of pipes in your walls that you don't think about until someone says "You're not drinking lead, are you?" or it freezes outside and you forgot to close one faucet and open another. You only notice when something goes wrong; you don't notice when something goes right.

After all, it doesn't matter if your uninsulated PVC water pipe runs through your uninsulated attic if it almost never drops below 33 degrees.

It doesn't matter that a handful of changes can double the speed of the regular expression engine, close a few dozen bugs, reduce its memory usage, and make some patterns that would never complete work -- at least if it's more important that some DarkPAN code may have poked into the guts of the regex engine inappropriately and no one knows how to maintain that code anymore, and it absolutely has to just work with every major version of Perl released in the next several years.

It doesn't matter that rearranging a few struct members in internal data structures reduces Perl's memory footprint dramatically and offers a modest speedup for mod_perl and SpamAssassin, because there's never been any encapsulation at the XS layer, and it would be a real shame if someone accidentally upgraded a box that contains code that relies on the old behavior and there aren't any tests or maintainers for that critical business function.

It doesn't matter that Moose makes so much boilerplate OO code go away, because someone, somewhere might have defined a custom class function with the (&) prototype and uses it as a bare word somewhere.

It doesn't matter that Perl is a pretty decent cross-platform system administration language by now, because the most common way to install Perl modules is still the first idea anyone had in 1994 -- using regular expressions and string concatenation to attempt to write cross-platform shell files (invoking Perl itself when those platforms don't support Unixisms such as touch and rm -rf) which require the presence of a Make utility (often not shipped with non-Unix platforms).

It doesn't matter that foundational core modules such as Test::Builder suffer maintainability problems because they have to work around long-deprecated features such as Perl 5.005 threads (thankfully, that's recently changed) under the theory that someone writing new code in 2009 with the most modern version of Perl the year 1999 had to offer obviously needs the ability to install the most recent version of a core module released in 2009.

It doesn't matter that the two-argument form of the open builtin is insecure and difficult to use safely, because three-arg open has only been around for nine years, so it might not be stable enough to rely on, and it's very, very difficult to migrate existing code to work around well-known security flaws.

Maybe I'm a very poor businessman, but I like when software gets easier to maintain and cheaper to write and safer and simpler over time.

How to Prevent Perl 5.12


I want software development to be so predictable that it's boring. I want people to take release dates for granted. I want them to yawn at completed deprecations. I want all of the surprises in new versions of Perl and Parrot to be delight at improvements: code runs faster, new features make your projects simpler and more elegant, rough edges keep disappearing. I want the development process to become repeated cycles of ideas, design, implementation, testing, refinement, and release. I want this cycle to happen annually, if not quarterly.

I want steady, sustainable progress in small, achievable, repeatable, and verifiable steps. I believe that's the only way to save Perl and its ecosystem from slow decline and irrelevance.

That's why I write here. This is a manifesto. My transparent intent is to identify obstacles and convince the rest of the Perl community to work around them (or better yet, to remove them). Some of those obstacles are the way we teach Perl. Some of those obstacles are the way we write and distribute Perl. Some of the most persistent and pernicious obstacles are the way we develop Perl itself:

  • No one can predict when (or if) Perl 5.12 will come out.
  • No one can predict which features it will have. (You can predict that it will have at least some of the new code currently in bleadperl which will not go into Perl 5.10, but can anyone tell me what those are?)
  • No one can predict how many point releases there will be in the Perl 5.12 series (nor the Perl 5.10 series, for that matter).
  • No one can predict how long people will support Perl 5.12, or Perl 5.10 for that matter.

I've written about the DarkPAN dependency management and support problem before. It's unrealistic to expect volunteers to maintain code they can't see, if that code even exists. That's unrealistic. p5p's attempts to do so are unsustainable; there's no feedback. There are only two motivations: the desire to write high quality software, and the desire to avoid guilt and shame. (That's false guilt, by the way.)

While thinking about a documented support policy for Perl 5, I came across a comment from Adam Kennedy:

I see the appropriate (and achievable) Long Term Support period for Perl as being around 8-10 years.

There are many possible descriptions of this expectation. The most polite I can imagine is "unsustainable". The easiest argument against it is one of the most persuasive.

Ten years ago, the newest Perl release was 5.005_03. Lexical filehandles, three-arg open, and the warnings pragma did not exist. Unicode was unreliable.

There have been fifteen stable releases of Perl in the intervening decade, even with the unpredictable release schedule. Biannual releases would have produced twenty stable releases of Perl. Quarterly releases -- my preference -- would have produced forty stable releases of Perl.

I won't speak for anyone else, but you couldn't pay me enough to support forty versions of a piece of software released over a decade. Good luck convincing a sizable fraction of the other 930 or so people listed in the Perl AUTHORS file to do the same.

The proper approach is to:

  • Document a sustainable support and development policy, then follow it.
  • Establish a regular release schedule.
  • Extract all essential DarkPAN features worth supporting into tests for the core test suite.
  • Replace the feature pragma with a pragma which explicitly limits the running instance to those features present in a specific release.
  • (After fixing feature...) Change the default behavior of Perl 5 to enable modern features.
  • Encourages businesses which believe they really must use ancient versions of Perl long past their shelf lives to purchase support contracts from businesses willing to take on that burden.

Continuing to pile this support burden on volunteers who do not know if the DarkPAN exists, let alone suffers from changes in modern Perls, is a great way to ensure that Perl 5.12 will never come out.

Put more positively, my suggestions are ways to reduce the barriers to participation for people who have an investment in the present and future of Perl 5. That's the only way to make Perl and its ecosystem sustainable: to divide the work among everyone who wants or needs it to succeed.

During a weekend discussion on the Perl 5 Porters mailing list, I volunteered to write a specification for Perl 5's support policy.

I believe in being explicit (darn it!) about what a community is willing to support and what community members can provide to other community members and users. I tried to reflect that when I wrote the first drafts of Parrot's support policy.

My goals are simple. I want to remove all magic and magical thinking from the release process. I want to remove all potential ambiguity from the upgrading process. I want to manage expectations so that the only surprise in a new release of the software is how well it delights users, and not that features have changed since the previous release.

That's easier with Parrot than with Perl 5, for several reasons:

  • Parrot is new software, with very few users who aren't already part of the vocal Parrot community.
  • Parrot has a regular release schedule which we can predict years in advance.
  • Parrot's main users are compiler writers and distribution packagers. Both groups are familar with and capable of managing change, if we communicate it effectively.
  • I have seniority on the project. (Don't underestimate that, if you want to influence a community.)

My goals for the Perl 5 support policy are to document current practices, to identify rough community consensus for future policies, and to encourage those policies toward modernization and sustainability. Boring, right? Perhaps it is, until you consider several constraints:

  • Perl 5 releases are unpredictable in date and scope.
  • There's no formal deprecation strategy.
  • Perl 5's target users are developers, but not necessarily professional programmers.
  • The maintainers of dual-lived modules have tremendous leeway as to updates and upgrades and compatibility changes.
  • I don't have commit access to the project.

I admit my motivations: I want to encourage p5p to drop support for old versions and I want to see new major versions of Perl 5 released every year. I want a one year deprecation cycle (any feature announced as deprecated in a major release may be removed by the next major release). I want support dropped for releases two years old and older (barring a major security fix someone feels strongly enough to patch).

Mostly I want Perl 5 in 2010 to be better than Perl 5 in 2009, and so on, and so forth, without always worrying about how Perl 5 did things in 1999 and 2000.

That's what I want. Those are my goals and biases. What do you want?

More Roles versus Duck Typing


I received more feedback on Perl roles versus Duck Typing than any other entry in my series on Perl roles so far. Much of this feedback asked very good questions and pointed out places where I'd assumed that theory or implications were clear. Before I compare Perl roles to any other mechanism, it seems useful to clarify more about what I meant in my duck typing entry.

If you haven't also read The Why of Perl Roles, start there. Those design goals are important to understanding the benefits and drawbacks of other approaches.

Can't You Just Design Your Hierarchy Correctly?

Ryan Funduk wrote in Ruby you can solve a lot of this by simply designing your class hierarchy appropriately..

That's true -- sometimes. If your data model fits nicely into a singly-rooted hierarchy, and if you can add variants at the leaves of the inheritance tree, inheritance works just fine.

Sometimes that's not possible. Ruby (and Perl and Python) make that process much easier. If your language supports multiple inheritance, you can have multiple parents (but as Perl Roles versus Inheritance describes, multiple inheritance has its own complications).

Can't You Code More Carefully?

paddy3118 pointed out, quite correctly, there's no substitute for knowing your code. I agree!

I worry less about my code than I do about the code of other people. When I release code to the CPAN, I try to write robust code that's sufficiently generic (or polymorphic) that I don't prevent smart people from doing smart things I never anticipated. I respect my published interfaces (and theirs). I don't forbid subclassing or specialization. I respect encapsulation. Sometimes I even install tests that they can adapt and specialize if they adopt and specialize my code.

Of course I also try to write safe and robust code.

One of the mechanisms by which I try not to forbid other people from doing something I anticipated is to avoid forcing them to inherit from my classes. I could sprinkle isa() checks throughout my code, but then they'd have to lie by overriding isa() or inherit when they really wanted to delegate or reimplement or compose.

I could throw can() checks to make sure that whatever they pass in supports a method of the appropriate name, but that leaves the code vulnerable to the false cognate problem.

I could ignore all of those possibilities and tell people that if the code breaks, they get to sweep up the pieces.

I prefer something safer and less prescriptive. If you pass my code an object which you've explicitly marked as performing a role we both understand, great! My code will do its best to work with it in a way that we both understand.

There may be bugs -- especially bugs of mutual misunderstanding of what that role implies -- but the role system will protect us against typos and incompleteness and collisions, and that's something you don't get from duck typing.

What if someone gets a role wrong?

Then your program has bugs, the same way as if someone gets inheritance wrong.

What if a role isn't a sufficient test of interoperability?

Then your program has bugs, the same way as if someone gets inheritance wrong.

What if two implementations of a role are incompatible in a given context?

I'm not sure what this means. If you compose two roles into a class and both roles supply a method of the same name, the role system will throw a compilation error. It won't try to disambiguate them; it cannot. You will have to do so.

This often means figuring out the context in which your class will perform one method or another and writing your own method which can dispatch appropriately.

Dog/Tree is a Dumb Example

Justin Donaldson wrote two longer, thoughtful responses: Duck Typing and "Roles" in Object Oriented Programming and Duck Typing and "Roles".

Justin's right; the "A dog can bark() and a tree has bark() example is silly." That's part of the reason it's a catchy Perl cliché. It's a deliberately simple, deliberately dumb example you can explain in a sentence or two to demonstrate a very real principle that appears in code in much more subtle ways: similar words do not always mean similar things.

If you prefer an example from spoken languages, never tell a native Spanish speaker that you're embarazada when you mean that you're embarrassed.

Can't You Check Multiple Methods?

Justin pointed out that if checking for the existence of a single method on an invocant of unknown type is insufficient to determine its type equivalence, checking for multiple methods is safer. I agree -- but then you have to check for multiple methods (likely with your runtime reflection system), and you've only reduced your uncertainty of false cognates.

I'm sure you can see where this is going. Perhaps after you've checked three methods you've reduced your uncertainty sufficiently, at the cost of multiple lines of code and multiple runtime checks.

Perl Roles force you to specify roles explicitly

Yes, they do. To perform a role, add does Name to your class declaration.

I joke. It's not quite that easy. For that to work, you must have identified a role in your system somehow, either by declaring it explicitly (similar to how you'd declare a class, except using the role keyword instead of the class keyword, for a savings of one character) or by declaring a class and treating that class as a role (an ability you get for free).

If your program is small and you're not worried about future extension or the cost of false cognates or duck typing problems, don't use roles. Don't use types. Don't use inheritance. That's perfectly acceptable. They're most valuable in larger systems where you do want pervasive polymorphism and extensibility without worrying about the drawbacks of ad hoc and unspecified invocations.

Checking methods is silly anyway; no one does that!

If duck typing's drawbacks weren't a problem, you wouldn't see Wikipedia on Duck Typing recommending the use of Python try/catch blocks for invoking methods on unknown invocants. If the method call cannot possibly fail, there's no reason to catch the "Hey, this method call failed! Hm!" exception.

Similarly, using can() to check that an invocant supports a potential method is a well-worn idiom in Perl in the same way that checking responds_to? is hoary Ruby.

I've never used Justin's haXe, so I'm completely unqualified to talk about how it solves these problems. I can only take his word for it. However....

There's no boilerplate if you predeclare a type!

Justin's followup argues that if you predeclare a type which defines an interface and use that interface pervasively throughout your system where you would normally use a type, you don't have to write boilerplate reflection code.

I agree, but I've categorized that as writing an interface (see Perl Roles versus Interfaces and ABCs): the act of explicitly declaring some combination of behavior and naming it and modifying your program to use that type instead of concrete, instantiable types.

You can do the same thing with roles. (You can also do more with roles.)

This is a fine technique, and I've used it to good effect. I don't classify it as duck typing, because it's an act of will to define this entity. I don't want to quibble over definitions, however.

Runtime reflection checks are rare in my code

Good! You don't run into the same problems I do.

Roles are rigid and top down

I'm not sure I agree with this. Most of the uses I've seen of roles are anything but rigid and top down. They're definitely more formal and structured than duck typing, but less so than subclassing inheritance. My colleague and sometimes co-author Ovid had a throwaway paragraph buried in his Flying Without Source Control rumination that using roles at the BBC has improved the flexibility of their code such that projects that would have taken multiple months now take minutes.

Roles violate Perl's loosely structured nature

"Violate" is a strong word, but a robust system is a robust system. Does Perl's test-infected culture violate its nature? Does the presence of an optional mechanism for managing code reuse and declaring the expected behaviors and relationships of entities without mandating their implementations of their behavior substantially change the nature of the language for the worse?

That's a philosophical question, and I'm not going to answer the worst case scenario. I prefer to believe that allowing the use of roles -- without mandating their use -- provides Perl with more opportunities to write great, robust, extensible, and understandable code. Certainly Ovid's writings demonstrate that his team has used them to great effect. (While he's one of the best and most reliable coders I've ever worked with, he'll be the first to disclaim any notions of rockstardom.)

I still don't get it

That's fine. Roles are subtle. I spent several years burning my fingers trying to combine code reuse with sufficient genericity and robust coding. When we saw Andrew Black present the Smalltalk Traits paper, a couple of my colleagues convinced me that what I was developing was sufficiently similar to the paper that Perl 6 could borrow their formalisms and achieve my goals.

If you really want to stretch your mind, consider this: what if your language had pervasive multi-dispatch built in, and dispatched not based on the class of all of its dispatchable invocants, but on the role of all of its dispatchable invocants? Remember, every class implies a role.

I don't intend to disparage any code that uses duck typing (or inheritance or interfaces or abstract base classes) successfully... but consider the implications in the previous paragraph. How would you even build such a thing using those techniques? Remember my design constraints for roles:

  • They must not dictate the implementation of conforming entities, allowing inheritance or delegation or composition or reimplementation.
  • They must not require editing existing entities to enable or improve polymorphic capabilities.
  • They must be full-fledged members of the type system.
  • They must provide compile-time disambiguation and refuse ambiguous composition.

If those aren't your design goals, that's fine. You can write a lot of useful, maintainable programs without them. Yet I believe we can write even more programs with them.

In this series I've explained why Perl roles exist, and discussed Perl roles versus inheritance and Perl roles versus duck typing. Comments on the latter posting have raised several good questions that I'll address in another posting. In particular, some people see the relative informalism of duck typing as a major benefit and rarely see value in other possibilities for abstraction and safety that roles provide. (Sometimes that's the right choice, too.)

Today's topic takes the opposite approach.

Subclassing Inheritance

Many people who learned object orientation through languages such as Java and C++ see inheritance as a vital component to managing large programs. I've argued before that this type of inheritance (by which I mean "a subclass extends a superclass") provides two features. First, the language's type system understands that a subclass/superclass relationship means that it's safe to substitute an instance of a subclass in any code which expects an instance of a superclass. Second, the subclass may (mostly) transparently reuse code defined in the superclass with little or no syntax required.

In other words, subclassing inheritance provides a mechanism of code reuse and a mechanism of identifying safe substitutability.

This works great when you can model all of the entities in your program in a singly-rooted hierarchy. Many simple programs do this effectively.

As the difficulty for creating sane biological taxonomies indicates, the real world does not lend itself to such artificial simplicity. (Extinction of the duck-billed platypus might have helped Linnaeus -- thanks to educated foo's suggestion for correcting this analogy -- but I'd miss the little guys. Besides that, I can't give birth to live young myself, so I'm obviously not a mammal.)

This is miles from the interesting question, however. I assert that the real and proper question for any API which wants to assert a property about the objects it affects is "Do you behave in a way consistent with my expectations?" In other words, it's much less interesting and much less general to say "My log_message() function requires an instance of a String as its argument" than to say "My log_message() function requires an instance of something which Stringifies."

If an object of one type can stand in for an object of another type, does it matter how that object does so? Hold onto that thought.

Abstract Base Classes

If you're arguing in your head right now saying "Program to an interface, not an implementation!" you're absolutely right. Well-encapsulated programs define well-understood APIs and let the internals of those APIs worry about themselves. As long as you know that an object (whatever its type) implements the proper interface, surely you can get on with your programming and let it go its own way.

One mechanism of ensuring that all object instances of a class a hierarchy implement the proper interface is to specify that interface in an abstract base class from which all classes in that hierarchy inherit.

If you've programmed in C, you might recognize this as a somewhat more modern descendent of a separate header file (unless you put executable code in your public header files, in which case no advice I give will help you).

Depending on the strictness and dynamicism of your language and compiler and runtime environment, you might get a compile-time warning that any subclass of this ABC implements all of the required methods rather than inheriting them. In other words, you do get the enforcement of this contract without all of that pesky code reuse.

If you believe that Don't Repeat Yourself is important in software -- and it is -- you may have the unenviable task of rooting around in a singly-rooted inheritance hierarchy to push concrete method implementations to the root of the tree where you have to maintain less duplicate code. This is why concrete base classes often contain a lot of methods that may or may not apply to all of their subclasses. Copy and paste seems wronger than overinheritance from god classes.

Some languages suggest that a singly-rooted inheritance hierarchy creates more problems than it solves, and they allow any class to inherit from multiple parents. This solves the code reuse problem to an extent, but it creates other problems related to the structural layout of objects of multiple classes, potential conflicts in attribute names, method resolution and visibility ordering, and circular parent relationships. These can present debugging difficulty.


Java (probably wisely) eschewed multiple inheritance, but recognized that an instance of any given class may conform to multiple interfaces properly. Thus it provides Java interfaces. (I've glossed over the history of this feature by speculating only on its motivation. Students of programming languages should look at C++, Eiffel, Objective C, and Sather for a better view of design influences.)

A Java interface is, effectively, an abstract base class from which you do not inherit. Thus, your Java class can inherit from a parent class and implement as many interfaces as you like.

One benefit of a Java interface is that you can use the name of an interface anywhere you could use the name of a class, and then you can use any object which implements that interface in any API that expects an object which implements that interface, no matter how it implements that interface.

One drawback of the Java interface is that offers no code reuse either.

Think of Java interfaces as slightly safer multiple inheritance without the possibility of code reuse and a slightly worse syntax, and you have them.

Are they really that bad? If you use them correctly, no. Does anyone?

Concrete Problems

Imagine you have an API written by someone else. You don't have the right (or access or source code) to change it. You have to live with it.

You have a method called on a Logger object called log_message(). It takes a single argument -- a String. Any String you pass to the logger gets logged to the appropriate place.

Suppose you have an object which represents a Customer -- a name, an address, some notes. Suppose you want to log the relevant customer information. Easy, right? Just produce a String from the Customer and send that String to the log_message() method.

Except suppose that the library's version of String supports onoe encoding and the String produced by the Customer object is an incompatible encoding... or this or that or you just object to the two-step boilerplate code that makes you manually stringify your Customer objects when they already know how to stringify themselves.

A better approach is to change the log_message() signature to decouple it from the concrete String class to an interface which means "Anything which implements this interface produces a String when I call its stringify() method."

Of course, that means changing the library. You may not have access to do this.

Suppose you did, however. You could create an abstract base class from which String could inherit -- if you have permission to modify that library, and if it doesn't already inherit from a concrete base class. You could create an interface which String implements, if you have permission to modify that library.

Perhaps that's a silly example. It's a deliberately simple example. Imagine that the necessary interface has two methods, or ten. Imagine that you want to pass in an object which performs its own logging, or diagnostic testing, or remote proxying.

You will likely find that everywhere you want genericity and code reuse decoupled from a singly-rooted inheritance hierarchy you need to program to interfaces, not instantiable classes -- and you use interface types, not classes, in all of your signatures, because you don't want to prohibit people from doing useful things even if their singly-rooted inheritance hierarchy doesn't match yours.

If I'm right -- if the real question is "Do you understand these methods the same way I understand them in this context?" -- this is busywork because there's one glaring flaw in the interface and abstract base class concept: there's no way of saying "Instances of this class are semantically equivalent and substitutable for instances of that class" unless you manually extract interfaces from every class defined in your system.

Every Class Implies a Role

Here's one subtle trait of roles: every class declaration also declares a role.

If you define a class:

class MyAwesomeClass
    method foo             { ... }
    method bar             { ... }
    method be_very_awesome { ... }

You can write:

class MyCompletelyUnrelatedClass does MyAwesomeClass
    method foo             { ... }
    method bar             { ... }
    method be_very_awesome { ... }

... and even if there's no inheritance relationship between these classes (as you can see, there isn't) and no formal declaration of the MyAwesomeClass role (and there isn't), you get all of the benefits of roles (method composition, compile-time API verification, genericity, substitutability) without having to modify the declaration of MyAwesomeClass, or rejigger its inheritance hierarchy, or extract a role manually. Any API you've already written which operates on an instance of MyAwesomeClass will operate safely and correctly on an instance of MyCompletelyUnrelatedClass without modification.

You get code reuse too, if you want it.

Perl Roles Versus Duck Typing


The Why of Perl Roles explains some of the motivations behind the inclusion of roles in Perl 6 and their implementation in Perl 5 through Moose. Perl Roles Versus Inheritance compares the design and intent of using roles to "traditional" subclassing inheritance.

Duck Typing

A common object design strategy in dynamic languages is Duck Typing. The cliché is that if an object walks() like a Duck and quacks() like a Duck, it must be a Duck enough that the rest of the program can treat it like a Duck. In other words, the presence of a method with a name that you recognize from some context is proof enough that the object responds to that method in a way that makes sense in the context you had in mind.

That often works.

False Cognates

Sometimes it doesn't work. Suppose (and yes, this is already a cliché in Perl circles) that your program models a park. You have a Dog class. You have a Tree class. Given an object for which you don't immediately know its type -- assume you're using a dynamic language or that your genericity system performs type erasure and you're effectively using a dynamic language -- do you know which bark() method is appropriate in any given situation?

This is the false cognate problem; the name of a method is not always sufficient to determine its intended meaning.

Duck Typing and Type Checking

Defense-minded duck typers soon realize that blindly calling methods on objects of unknown type is a recipe for disaster. (To be fair, a well-organized program runs into this problem rarely. Even in the absence of strict typing -- or manifest types -- it's rare not to know the types of objects you expect within specific scopes in the program. Then again, I don't add error checking to my programs because I expect exceptional conditions to occur frequently.)

One approach is to use object introspection to see if the object in question really does support the required method. In Perl, this is:

croak 'Duck typing failure' unless $dog_or_tree->can( 'bark' );

Or in Ruby, something like:

raise 'Duck typing failure' unless dog_or_tree.respond_to?( 'bark' )

If you want to get stricter, and if your language supports this, you can even check the arity or types of the allowed signatures of the method -- but look at all of the boilerplate code you have to write to make this work. That's also code to check only a single method.

Suppose you want to call several methods on the object. You could check can() or respond_to? for each of them... but at this point, people often check the class inheritance of the object, in Perl with:

croak 'Duck typing failure' unless $dog_or_tree->isa( 'Dog' );

Or in Ruby, somethin like:

raise 'Duck typing failure' unless dog_or_tree.is_a?( 'Dog' );

Of course, this precludes other ways in which your object can perform the correct bark() methods: reimplementing it, mixing it in from elsewhere, delegating to it, or composing it -- unless you explicitly lie to the rest of the system about the identity of your object by overriding isa() in Perl or is_a? in Ruby.

Liars tend to get caught, and the results can be messy.

Roles, False Cognates, and Identity

In the context of a Dog, it's obvious what bark() means. In the context of a Tree, it's obvious what bark() means. Without that context, you just don't know.

Roles add that context back. If you want to deal with a Dog, check that the provided object performs the Dog role. Similarly for Tree. Instead of asking "Do you provide a method with this name?", ask "Do you perform this role?" The latter question avoids false cognates and ensures that the class representing the provided object fulfills the contract required by that role at compilation time.

As well, you don't have to check the inheritance structure of a given object. It doesn't matter. The most important lesson of duck typing is that any object which provides an interface you both understand appropriately should be substitutable for any other object which provides that well-understood interface. How that object fulfills that interface is its own business.

Roles provide a way for developers to name a collection of behavior and then refer to objects -- generically -- in terms of whether they provide that collection of behavior. The ad hoc, free-form nature of duck typing is great for providing future extensibility; it doesn't lock your code into a rigid hierarchy that can prove brittle during future maintenance.

However, duck typing sometimes fails to provide enough information about necessary meaning and context, and the workarounds to make a duck typed program more robust can subvert the goals of duck typing.

Next time, I'll compare roles to (Java) interfaces.

Perl Roles Versus Inheritance

| 1 Comment

In The Why of Perl Roles I explained some of the motivations behind the inclusion of roles in Perl 6 and their implementation in Perl 5 through Moose.

One common question is "How do roles differ from subclasses?" (where "subclassing" means "inheriting state and behavior from a parent class" as in C++, Java, et cetera).

Reddit commenter netytan pointed out that plenty of OO research literature uses the word "inheritance" in a broader sense, but I'm limiting it here to the concrete behavior by which you can specialize a more general class and take on or override its behavior. Any poor academics who run across this, please feel free to bite your lip because I'm about to tell some lies to help explain things to people who may not read your papers for fun.

Features of Inheritance

Suppose you write a class:

class Parent
    method wipe_nose { ... }
    method tie_shoes { ... }

You've declared a type, Parent, which has a specific interface (methods wipe_nose and tie_shoes) with specific signatures (zero-arity methods, in the absence of other information such as language support for slurpy parameters or mandatory signature checking), and an implicit semantic meaning for the term Parent. That is, you've modeled an entity representing some ideal. Whenever you refer to a Parent in your code, you mean this entity. Whenever you invoke the wipe_nose method on anything you treat as a Parent, you have specific behavior in mind -- the wipe_nose it makes sense for a Parent to do.

Suppose you extend this Parent by subclassing it (what I refer to here as "inheritance"):

class Child extends Parent
    method wipe_nose   { ... }
    method play_in_mud { ... }

This declares another type, Child, which behaves as a Parent (revealing uncomfortable semantic properties of my naming system here). In theory, anywhere I can use a Parent appropriately, I can use a Child appropriately. (The Liskov Substitution Principle applies.)

The converse is not true; a Child is a specialization of a Parent. You can see this in that the Child class defines another method, play_in_mud, which is not present in Parent. Parents can't play in the mud, lest they get their shoes dirty. (Of course my parents still think of me as their child, and I can take off my shoes anytime I want -- so the code example isn't as wrong as it could be. Real life is taxonomic weird.)

You've probably also noticed that the Child does not declare a tie_shoes method. The Parent implementation of tie_shoes suffices.

Even this simple example implies several features of this type of inheritance:

  • Classes define types which represent named collections of state and behavior. (Type declarations.)
  • Classes can extend or specialize other classes. (Subtyping -- though see Contra vs. Covariance for some debate over what languages can and should provide.)
  • Classes can include behavior from superclasses or reimplement behavior. (Code reuse.)

In my mind, this type of inheritance conflates two separate concerns: type relationships and code reuse.

When Inheritance Attacks

Problems arise when you want one or the other but not both (or at least not in the way the language supports).

Suppose you want to write a Guardian class which performs the same behaviors as the Parent class:

class Guardian
    method wipe_nose { ... }
    method tie_shoes { ... }

How do you express that a Guardian is equivalent and semantically substitutable for a Parent everywhere (at least in terms of the behavior described here)?

If your language supports this, and if you have the ability to modify the declaration and use of Parent, and if you can reify the results to something concrete, you could extract Parent into an interface like Java supports -- perhaps ICanHasChild.

What if the implementation of wipe_nose is the same between Parent and Guardian? You could subclass Parent to create Guardian and inherit the implementation of those two methods, but then Child has to subclass Guardian -- and speaking of weird taxonomic relationships, that hierarchy doesn't make sense either.

Then there's multiple inheritance and abstract base classes and other approaches to try to minimize all of the complexity of the conflation of these two separate ideas.

The Role Separation of Concerns

The important attribute of the Guardian type is that a Guardian can act like a Parent. It supports the same behaviors that a Parent supports. It may need to reuse some implementation from the Parent, but it's not clear that it has a hierarchical taxonomic relationship to a Parent.

Imagine instead if you could write:

class Guardian does Parent
    method wipe_nose { ... }
    method tie_shoes { ... }

Presuming you have a role-based type system, every place in your code that checks for a Parent object can use a Guardian object because the wipe_nose method called on either a Parent or Guardian object means "Grab a tissue, then hang on to your kid."

Alternately, you can slice all of these behaviors into more meaningful and smaller units of behavior: GoodHygiene could include the cover_your_mouth method and the wipe_nose and the brush_teeth methods. Perhaps they act on the invocant by default, so you can compose that role into any Person class -- but perhaps there's a role which represents performing good hygiene principles on a kid. Perhaps not. It depends.

Perhaps your Parent is also the Guardian for a grandparent. Perhaps your Parent is the Child of a grandparent. Perhaps your Child does babysitting for even younger children.

How do you model those relationships with an inheritance hierarchy?

The important question is not "How do these entities relate to each other in a static taxonomy?" but rather "What do you do?"

Next time: roles versus duck-typing.

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide



About this Archive

This page is an archive of entries from May 2009 listed from newest to oldest.

April 2009 is the previous archive.

June 2009 is the next archive.

Find recent content on the main index or look in the archives to find all content.

Powered by the Perl programming language

what is programming?