July 2009 Archives

Tautology alert: Perl 5 is the toolkit; CPAN is the language.

The background: Mojolicious is a Perl 5 web framework with no non-core dependencies. Curtis Poe praised Mojolicious's marketing and remarked that its lack of dependencies could be an advantage. Adam Kennedy used CPAN analysis techniques to remove unnecessary dependencies from Padre (and Mojolicious used similar techniques to make their software even more dependency-free on older versions of Perl 5). John Napiorkowski reported on his work to solve dependency problems.

There's a philosophical question here.

Jay Kuri made a distinction between internal and external dependencies and suggested embracing dependencies. Yuval Kogman discussed how dependencies improve encapsulation and reduce reinvention through nasty hacks.

You might get the idea that there's a debate over the use or avoidance of dependencies.

That's not the debate. That may seem like the debate, but that's the wrong debate. It's a false dilemma. We won't solve the philosophical problem of "When is it right to reuse external code versus patch it locally versus write our own version?" That's one of the debates of art in software development. (Like debates of art in other endeavors, the right answer is "It depends." (I make an exception over debates of art in literature, where the right answer to "Is fanfic an abomination?" is "Yes."))

The real debate over dependencies is a question of change management. The real, underlying questions over dependencies are:

  • Can I install dependencies reliably on platforms I care about?
  • Do the necessary tests pass for all of the required dependencies?
  • What happens when a dependency changes?
  • What other dependencies does one dependency require?
  • What implications are there from using the entire dependency graph, legal or otherwise?
  • What support implications are there from using the entire dependency graph, legal or otherwise?

The Waterbed Theory of complexity argues that, all other things being equal, simplifying one part of the system -- the development and maintenance of the program which embraces dependencies, for example -- tends to increase complexity somewhere else.

In many cases, that somewhere else is a burden on installers and users of the system.

The real debate is over where that complexity belongs.

The false dilemma is that it belongs neither on the maintenance side nor the installation side. We could resolve most CPAN dependencies statically with existing tools, providing single-installation superdistributions we believe will work reliably for average users.

CPAN is great in many ways. CPAN installer dependency resolution tools are flexible and powerful for people willing to learn how to use them. They're good for power users. They're great for people capable of testing the latest versions, then filing bugs and writing patches to fix inevitable oversights when things go wrong.

This work doesn't have to subvert Task::Kensho or an extended core or Strawberry Perl or Shipwright or anything else.

All we need to do is tie together some existing tools, identify distributions without declarative dependencies (or with incorrect metadata), and encapsulate the complexity of installation into complexity of bundling. We don't even have to remove the fallback -- you don't have to use a system unless you want it.

Is there a downside?

If the thesis behind this web site is correct, something called Modern Perl or Enlightened Perl exists. It's definable. A well-socialized member of the Perl community can look at a piece of code and say "That's modern!" or "That's not modern!" Code has an aroma; we speak of code smells as antipatterns. Good Perl 5 code written in 2009 has identifiable characteristics which distinguish it from bad code written in 2009 or mediocre code written in 2004 or decent code written in 1999.

Many of those differences are subtle. Some come from Perl Best Practices. Some don't.

It's interesting to me to consider some of the changes in the Perl community and in Perl 5 itself which contribute to the new Perl 5 renaissance. This list is obviously subjective and perhaps incomplete in places, but I see these events as particular watersheds.

  • Perl 5.6.0, released in March 2001. The addition of lexical filehandles alone mark a huge divide between ancient Perl and modern Perl -- almost every code example I've seen in the past eight years which eschews lexical filehandles for globals (even localized globs) has other stylistic problems. It's a shibboleth, but it's a reliable one.
  • Test::Simple, first released in March 2001. Prior to this, most of the testing of CPAN modules relied on self-described black magic, autogenerated by the long-in-the-tooth h2xs utility and placed in a file called test.pl:

    ######################### We start with some black magic to print on failure.
    
    # Change 1..1 below to 1..last_test_to_print .
    # (It may become useful if the test is moved to ./t subdirectory.)
    
    BEGIN { $| = 1; print "1..1\n"; }
    END {print "not ok 1\n" unless $loaded;}
    use NewModule;
    $loaded = 1;
    print "ok 1\n";
    
    ######################### End of black magic.

    Compare that to modern test writing with Test::More or Test::Class or Test::LectroTest.

  • Module::Build, originally released in August 2001, intended as a release for ExtUtils::MakeMaker, which must die!. Even though it took a few years for M::B to be a complete replacement for EU::MM, it's been a huge improvement over the mess that is EU::MM.
  • Test::Builder, first released in September 2001. Extracting the plan and counter and basic test reporting features from Test::Simple and Test::More into a reusable single object allowed the explosion of Test::* modules available on the CPAN now. I believe this is a primary driver of Perl's test-infected culture.
  • PAR, first released sometime before February 2003, is an impressive toolkit for bundling Perl 5 applications into a single redistributable file.
  • Perl 5.8.1, released in September 2003, represents a huge amount of work put into testing the Perl 5 core. Perl 5.8.0 had some of this work, but the number of assertions in the core test suite quadrupled sometime between 5.6.0 and 5.8.1. That number has only increased.
  • Perl 6 Apocalypse 12, released in April 2004 (superseded by Perl 6 synopsis 12 and Perl 6 Synopsis 14), which described a very Perlish but very powerful, declarative, powerful, overridable, and even shiny and new object system for Perl 6. Of particular interest is Perl Roles.
  • CPANTS, released sometime before July 2004. This service provides automatic analysis of all distributions uploaded to the CPAN for quality of packaging and other metrics which often indicate high quality of code.
  • The book Perl Best Practices, released in July 2005. Damian didn't get everything right -- in particular, inside-out objects had a short lifespan -- but the resulting discussion of coding standards and good style helped catalyze good design practices on many projects.
  • PPI 1.0, released in July 2005, forms the basis of important tools to analyze Perl 5 code statically. This is important for....
  • Perl::Critic, initially released in August 2005. This tool continues to grow more indispensible; it's a great (and customizable) way to analyze your code for good style.
  • CPAN Testers, website launched in 2006. The project itself dates back to 1999, but several toolchain and testing culture improvements culminated into the amazing automated system that CPAN Testers is today.
  • Moose, originally released in March 2006. Moose may be the most important project in Perl 5 in the past five years. It takes some inspiration from Perl 6 objects, of course, but also Smalltalk, CLOS, and other well-explored and well-understood systems and produces a very powerful, very usable, and very perlish system.
  • Strawberry Perl, released as an alpha in July 2006. This distribution of Perl 5 includes software and configuration so that Windows Perl users can take full advantage of the CPAN.
  • Devel::Declare, first released in September 2007. This may be the most important project in Perl 5 for the next five years. It allows language extension in Perl 5 itself without the use of source filters. I use this module as shorthand for several other important distributions and features.
  • local::lib, first released in September 2007, intended to make CPAN module installation much easier as a non-root user. (CPAN and CPANPLUS shell improvements help as well.)
  • Perl 5.10.0, released in December 2007. In particular, the feature pragma relieves some of the pressure of backwards compatibility when adding new features to the language. I'm not a fan of how it works, but I'm glad that something exists to break that logjam.
  • Padre, the Perl IDE written in Perl, first released in July 2008. (Honorable mention also goes to Kephra, which dates back at least to December 2007.)
  • The Enlightened Perl Organization, formed (I believe) in late 2008 and early 2009, intended to enhance Perl 5 and modernize the core and the community. An early glimpse at its extended core initiative is Task::Kensho.
  • Iron Man Perl Blogging Challenge, announced in April 2009, which has dramatically increased the amount of discussion in the Perl world.

I may have missed a few spots along the road; feel free to fill in any gaps you see. I've deliberately left off most of the events of the past several months. Time will let us judge their efficacy and their legacies.

Fixing a Bug in Perl 5

I fixed a bug in Perl 5 this morning. I'm not asking for praise; lots of people fix bugs in Perl 5. In particular, Perl 5 pumpkings deserve tremendous praise for managing the bug queue -- and often fixing nasty bugs no one else wants to explore.

The bug was small and the problem seemed obvious. The fix was simple. Thus the process of fixing the bug may be valuable to document. In particular, corehackers exists in part to recruit new developers. Its articles about Perl 5 internals needs more information.

Here's a start; here's how I fixed a small, obvious bug in Perl 5.

The Bug

Dave Taylor reported a bug with the syswrite builtin. Using syswrite on an empty string with an offset writes garbage to a filehandle:

$ /usr/local/refperl/5.10.0/bin/perl -e 'my $foo = ""; syswrite
STDOUT, $foo, 100, 1' | less
<DC>8 /null^@^@^@^Y^@^@^@^A^@^@^@ ^@^@^@^P^@^@^@X^V9
^@^@^@^@<89>^@^@^@<80><A6>^U^H^@^@^@^@^@<A6>^U^H^@^@^@^@<80><A7>^U^H^@^@^@^@^@
<A7>^U^H^@^@^@^@<80><A8>^U^H^@^@^@^@^@<A9>^U^H^@^@^@^@^@<A5>^U^H^@^@^@^@<80><A4>^U^H^@
(END)

His test case is simple and his diagnosis looks reasonable.

At this point, the bug is obvious to a C programmer. Even though Perl 5 knows that $foo is an empty string, the syswrite code attempts to read 100 characters from the string starting from the second character of the string (at offset 1). The garbage shown in the bug report is whatever's in memory one byte after the memory address of the null string in $foo.

Whatever reads this data to write with syswrite checks the length of the string in $foo incorrectly.

Finding the Culprit

That seemed like a reasonable hypothesis. Where was this code?

I already knew that it was likely in a C function called pp_syswrite in one of the pp_*.c files in the Perl 5 core. If I hadn't known this, I could have used B::Concise to figure out where to look:

$ perl -MO=Concise -e 'my $foo = ""; syswrite STDOUT, $foo, 100, 1'
d  <@> leave[1 ref] vKP/REFC ->(end)
1     <0> enter ->2
2     <;> nextstate(main 1 -e:1) v:{ ->3
5     <2> sassign vKS/2 ->6
3        <$> const[PV ""] s ->4
4        <0> padsv[$foo:1,2] sRM*/LVINTRO ->5
6     <;> nextstate(main 2 -e:1) v:{ ->7
c     <@> syswrite[t3] vK/4 ->d
7        <0> pushmark s ->8
8        <#> gv[*STDOUT] s ->9
9        <0> padsv[$foo:1,2] s ->a
a        <$> const[IV 100] s ->b
b        <$> const[IV 1] s ->c
-e syntax OK

B::Concise compiles a snippet of code into an optree, then walks that optree and serializes it to textual output. That output represents the operations Perl 5 performs when it runs the program. I've emboldened the important line. That line shows a LISTOP (a type of node in the optree which has multiple child nodes) which performs an operation called syswrite.

The node type isn't important; what's important is the name of the operation. Each operation implies the existence of a function in the Perl 5 core called pp_operation. PP is, as I understand it, short for push/pop, which means that these functions operate on the Perl 5 stack (more or less equivalent to @_ in Perl 5 code).

I used App::Ack to search for pp_syswrite in the appropriate *.c files. It led me to mathoms.c, which is sort of a limbo for functions that used to be in the core and exist now for compatibility reasons. pp_syswrite is now:

PP(pp_syswrite)
{
    return pp_send();
}

Thus the implementation of syswrite is in the function pp_send(), which is in pp_sys.c. That function is too long to reproduce here, but there's an interesting branch around 80 lines in:

if (op_type == OP_SYSWRITE) {
    Size_t length = 0; /* This length is in characters.  */
    STRLEN blen_chars;

    /* set blen_chars to the length of the string in C chars */
    /* ... */

    if (MARK < SP) {
        offset = SvIVx(*++MARK);
        if (offset < 0) {
        if (-offset > (IV)blen_chars) {
            Safefree(tmpbuf);
            DIE(aTHX_ "Offset outside string");
        }
        offset += blen_chars;
        } else if (offset >= (IV)blen_chars && blen_chars > 0) {
            Safefree(tmpbuf);
            DIE(aTHX_ "Offset outside string");
        }
    } else
        offset = 0;

    /* ... */

This code goes through some gyrations, counting the length of the string in C chars (this is more complex than you think when you have UTF-8 in the string). Eventually blen_chars contains the length of the string. offset is the offset value in this branch.

I've emboldened the offending code. The intent of that line's conditional is to check that the offset isn't outside of the string. Unfortunately, when the string is zero chars long, the second part of that line's conditional is false and the entire conditional fails.

Fixing the Bug

If blen_chars is 0, then any offset of one or more chars will be outside the range of the string, so the exception is necessary. (Deleting that conditional also means that using an offset of 0 with an empty string will also throw that exception. I could argue that behavior both ways.)

Vincent Pit checked in the patch for RT #67912, as well as a test case derived from Dave's code:

eval { my $buf = ''; syswrite(O, $buf, 1, 0) };
like($@, qr/^Offset outside string /);

I used Dave's command-line invocation as a quick test while creating the patch. After running the entire core test suite to verify that there were no obvious regressions, I mailed the simple patch to p5p:

--- a/pp_sys.c
+++ b/pp_sys.c
@@ -1919,7 +1919,7 @@ PP(pp_send)
                    DIE(aTHX_ "Offset outside string");
                }
                offset += blen_chars;
-           } else if (offset >= (IV)blen_chars && blen_chars > 0) {
+           } else if (offset >= (IV)blen_chars) {
                Safefree(tmpbuf);
                DIE(aTHX_ "Offset outside string");
            }

This is an interesting case where deleting code can make the project's behavior more correct.

Five hours elapsed from the time of reporting to the time of the commit of the fix. In addition, Vincent also refactored the sysio.t test for clarity and ease of maintenance. This is how community-developed software should work!

It's a simple bugfix. Anyone reading this could have figured it out without too much work. Not all bugs are this simple, but the process is reasonably easy and there are plenty of people willing to help you get started improving Perl. (It's also very satisfying to think of all of the people who won't run into this problem in the future because you fixed it for them.)

CPAN, Convergence, and Core

| 2 Comments

In Take Advantage of Modern Perl at YAPC::NA 2009, I mentioned a two-word mantra I keep in mind:

Seek convergence.

I'm glad to see Devel::Declare; it allows clever Perl 5 developers to add specific syntax to Perl 5 in ways that subroutine prototypes don't permit (MooseX::Declare may be my favorite) and it is safer and more debuggable than source filters.

It's not just for adding new features. It allows fixing existing features. In particular, Ash Berlin's TryCatch packages up almost all of the tricky little details required to handle exceptions safely and properly in Perl 5 in a syntax that's clear, concise, and difficult to get wrong.

Those are nice properties for a programming language feature.

CPAN

If you have access to the CPAN, you can use these modules today. You can mangle and manipulate and modify the syntax of Perl 5 you use on your project to add new features and to reduce the possibility of misusing existing features.

In one sense you can think of a language itself as an API. Certainly that's one way to look at Lisp, Scheme, Forth, Tcl, and Smalltalk.

In another sense, languages without the property of homoiconicity -- or languages not built up from a tiny set of primitives which allow you to define your own first-class citizens which appear to interact the same way as the built-in primitives -- make a clear distinction between language and library.

That's not necessarily a bad thing. The "DSL's Everywhere!" crowd will soon learn (individually) that language design isn't easy. Syntax and semantics and even partial formalisms require thought and experimentation and willingness to make changes and often fanatical attention to detail tempered by well-understood user requirements.

This is a roundabout way of saying that CPAN is not just a place to find libraries for building your own Jabber server or parsing FASTA data or calculating tidal cycles based on the Incan calendar. Some CPAN distributions help you write and maintain code (Perl::Critic, for example). These are tools.

Some CPAN distributions mangle the language itself. The SUPER module works around a pervasive feature in Perl 5's core method dispatch that often does the wrong thing in circumstances Perl 5's default object system makes easy. Moose gives Perl 5 a powerful object system based around a well-defined metaobject protocol. Want enhances caller() to report many other types of contexts. Coro adds a new type of control flow.

You know this; this is not a new insight. I mention them only to point out that there is a class of CPAN distribution which provides features that might apply to many, many programs. Their concerns have language scope: better exception handling, workarounds for core misfeatures, replacements for language-level features.

Hold that thought.

Convergence

Two of the most useful community developments I've seen in Perl 5 -- ever -- are DateTime and the Perl Email Project.

Both began out of frustration of a fragmented, wildly divergent problem area. Both required a lot of work to standardize on one good way to do things. Both required strong leadership and time. Both have produced powerful, usable, easy-to-recommend code.

(Both inflicted some birthing pains. You can't avoid that.)

A simpilar project exists today. The Extended Perl Core is an Enlightened Perl initiative intended to identify bundles of CPAN modules which provide features not currently in the Perl 5 core. These modules should represent the best code CPAN has to offer -- code which, while not perfect for every need, is suitable for at least 80% of the common cases.

This will be a lot of work. Not everyone will agree, where multiple alternatives exist. That's fine; that's healthy. Your needs are not the same as my needs. Alternatives will still exist.

Yet we can seek convergence on what's common and which of myriad alternatives provides the best default to start.

Keep that in mind too.

Core

Consider the cross-cutting concerns of certain CPAN code. Consider convergence. Consider the core.

I can't hide one of my desires for Perl 5: I'd like to identify some of these language-level concerns distributed on the CPAN and used widely and consider bringing them in to the language itself.

Maybe that's not the entire feature; maybe the core doesn't need all of Moose's MOP. Maybe it can't currently handle all of Coro. P5NCI isn't nearly stable enough, nor is it obviously the right approach for an equivalent to Python's wonderful ctypes.

Yet I believe it would be a shame for a language as flexible, as malleable, as pragmatic, and as open to experimentation as Perl 5 to encourage all of this evolution on the CPAN and in the wild without a language designer occasionally identifying pain points and bringing ideas and implementations back into the core language.

I'm not saying it's easy. I'm not saying it never happens. I'm saying that I believe this is a lovely goal for Perl development -- 5 and 6.

Rafael believes that deprecation does not imply removal; he takes issue with my argument that removing deprecated features is a healthy act of maintenance.

That's fine; let me expand my argument further.

The "Words Mean Things" Argument

One reason why deprecation should imply removal is that the act of deprecating a feature is a formal expression of disapproval. For example, deprecating the Switch module reflects a strong community belief that you cannot use that module reliably.

There are other reasons to discourage the use of a feature: the existence of a better approach (certainly true in the case of Switch), the difficulty of maintenance, performance penalties (in the case of $&), the possibility of confusion (sigilless aggregate variables), and more.

If you've browsed perldiag, you'll see that some features marked as "deprecated" also contain the text "will be removed in a future version."

The Ease of Maintenance Argument

Assume that the Perl 4 package separator were a deprecated feature. Assume that you want to add a new feature to Perl 5 which deals with package names. (I've done this.)

Every place in the tokenizer -- look for yourself, it's toke.c -- which deals with what may be a package name must explicitly look ahead at each character position for either a single tick or double colons. I can think of five or six places where this is the case. (There may be more. It's not always easy to count by skimming the code; a single tick is a valid character with special tokenizing semantics in at least two other contexts in the tokenizer.)

If you want to add a new mechanism to deal with package names, you must be aware of this separator. If you want to refactor the code which deals with package names, you must be aware of this separator. If you want to add a test case which deals with package names, you must be aware of this separator.

In short, the maintenance burden for the tokenizer is a little bit higher due to support for a feature largely unused (I know of two cases outside of the Perl 5 test suite: isn't from Test::More and the joke D'oh module) and almost universally unknown (if you've used Perl 4, you may remember it; if you started using Perl sometime in the past fifteen years, you may have learned it the hard way).

The "You Can't Have It Both Ways" Argument

Rafael suggests the "removing features might break existing code" argument is spurious. That conveniently overlooks a nasty little conundrum in deprecating code. If you think for a moment about exactly what I quoted in my previous post, you may see it. Ready for the spoiler?

Deprecating features means adding warnings. As Abigail points out on p5p from time to time, adding warnings may break existing code.

Enter the DarkPAN Pseudo-ontological Silver Bullet argument. Imagine that some code exists outside the purview of the CPAN. Imagine that this code uses the deprecated feature.

This could go two ways.

Either the DarkPAN codebase eschews warnings, in which case its maintainers may never find out about the deprecation until the point of removal (and when yet another pumpking asserts -- even in jest -- that people who don't test their code against Perl 5 prereleases deserve any unpleasant surprises they get, it's difficult to understand why there's little sympathy for new and unknown bugs versus so much sympathy for deprecated features announced and removed after a lengthy deprecation cycle)...

... or the DarkPAN codebase uses warnings, in which case upgrading to a new version of Perl 5 which has deprecated a feature will suddenly produce warnings where it produced no warnings before.

Here's a clever rhetorical trick.

What if such a codebase used fatal warnings? The act of deprecating a feature could break code!

Because there's little visibility into the DarkPAN, it's easy to imagine that this is the case. The fatal warnings feature exists. Some code uses it. Deprecated features exist. Some code uses them. Might there be a union of those two sets?

Don't count on the "But it's not likely!" counterargument to save you. How likely is it that DarkPAN code relies on the Perl 4 package separator? On sigilless aggregates? On using reference syntax on non-reference aggregates? On split in scalar context clobbering @_?

Until the DarkPAN is observable, the best anyone can do is guess -- and that's hardly a reliable mechanism for discipline in software development. (I suspect the right approach to handling upgrades is to admit that upgrades require caution and planning and mindfulness and to stop overcomplicating them to overcompensate.)

I realize that this is a scorched earth argument against invoking the DarkPAN pro or contra any potential change. That's deliberate.

The "Gratutitous Breakage" Argument

I'm not the only one with strong rhetoric. Quoting Rafael:

Basically the only real argument against removal of features is precisely the one that chromatic persists in ignoring: preservation of backward compatibility, to avoid gratuitous breakage of Perl programs.

The emboldened emphasis is mine. If a deprecation notice has been present for multiple years in multiple stable releases (a feature deprecated in Perl 5.6.0, for example, has been deprecated for nine years and fourteen stable releases of Perl 5), this is a use of the word "gratuitous" of which I was not previously aware. I suppose you could make the word work if you argue that marking a feature as deprecated offers no justification -- none -- for its eventual removal, but that circles back to the "Words Mean Things" argument.

Note again that the fatal warnings feature condemns even marking a feature as deprecated. In a codebase with those characteristics, the deprecation is itself arbitrary in a way that the removal is not: at least the removal has had an announcement period!

The "For Whom Are You Optimizing?" Argument

Given the choice between deprecating and removing a feature which has had a wildly popular replacement for fifteen years, which exists to support (as far as anyone can tell) two public features (one a joke and the other easily rewritten), and frequently confuses novices and keeping it for the purposes of backwards compatibility, I'm unabashedly in favor of simplicity.

Every time I write an API, I try to make it impossible to abuse. That's not always possible, but I want to encourage users to do the right thing by making it so easy to do that they won't consider doing the wrong thing.

Every time I read a bug report, I (try to) rethink my API assumptions. How can I change my code to make that class of bug impossible again?

That's not always easy in language design. It's difficult. Yet it's possible.

That requires a prioritization of features and design and design decisions based on your intended users. Given that Perl 5 won't enable strictures and warnings by default for several years (if ever), how will novices learn to avoid deprecated features? (Expecting them to peruse perldiag or another FAQ for "What Not to Do?" is silly. Have you ever worked with a novice? If they knew which documentation to read and how to read it and what it all meant, they wouldn't be novices!) How will they realize that the awful Perl 4 style tutorial from which they're copying and pasting code is buggy, insecure, inefficient, unmaintainable, and a really bad example of the power and elegance modern Perl affords?

I fear that if the argument "We can't change things that may break unmeasured DarkPAN code" trumps the desire to clean up some of the rough, pointy edges of Perl 5, that Perl 5 development has prioritized keeping old (and, let's face it -- if it relies on these deprecated features, you know it's crufty) code running over attracting new developers.

Characterize that concern as "A crazy, wild-eyed, frothing-at-the-mouth desire to destroy Perl 5, salt its earth, and make everyone into Java programmers!" if you want. I suspect some people reading this already have. Yet my previous post quoted from Perl 5's documentation regarding deprecated features. (To be fair, I also consider deprecation long overdue for the Perl 4 package separator, but it's not in perldiag, so there's no official deprecation.)

The "That Wasn't My Point Anyway" Argument

The goal of my previous post was very different anyway. I repeat it here because I believe it's more practical for discussion than a philosophical discussion based on differing visions for Perl 5:

I ponder the existence of an alternate Perl 5 binary with deprecated features removed; would DarkPAN developers run it against their test suites and report any results where modifications are onerous? Would that provide sufficient data as to the effects of removing these features?

A modest amount of refactoring could produce an alternate Perl 5 binary -- perhaps cleanperl or a better name -- which has these deprecated features compiled out. That might be a concrete way to see if Perl 5 can run more quickly or to gauge the degree to which these removed features affect real world codebases.

Deprecated Pointy Bits

| 7 Comments

A deprecated feature in Perl 5 is a feature which is confusing, difficult to support, difficult to understand, difficult to use correctly, difficult to use safely, or an accident of implementation no one wants to maintain in the future.

There's no written deprecation policy for Perl 5. By rough consensus of practice, a deprecation notice must appear in at least one major release of Perl 5 before anyone can consider removing the deprecated feature. Thus deprecating, for example, the Switch module in Perl 5.10 means that it may not be a core module in Perl 5.12.

It may remain in core for Perl 5.12, but you the purpose of a deprecation period is to encourage you to migrate away from it before the release of Perl 5.12. (I chose this example deliberately; the given/when construct backported from Perl 6 to Perl 5.10 obviates the need for the fragile Switch module. It's easy to argue for its removal.)

Recent Deprecation Discussions

The issue of removing deprecated features comes up on p5p periodically. For example, George Greer suggested removing the Perl 4 pseudo-package separator; you can use the tick to separate package names like you use double-colons in Perl 5. Thus D'oh uses this old syntax where the modern syntax would be D::oh.

Is this a problem? Tatsuhiko Miyagawa gave an example:

use strict;
my $name = "Joe";
print "$name's birthday is tomorrow\n";

Add the warnings pragma for a hint at what might be wrong. If you're using anything older than Perl 5.10, good luck at guessing at the problem -- Perl 5.10's warning message expands the name of the variable to $name::s, which is substantially more helpful, if you know that ' is synonymous with ::.

The correct way to write this code with interpolation is:

use strict;
my $name = "Joe";
print "${name}'s birthday is tomorrow\n";

That won't win any beauty contests. (Or should I say "That {won}'t win any beauty contests.")

On PerlMonks today, the hash reference question demonstrated code which worked but confused a novice terribly:

%package = ( 'zips' => {1,2,3,4} ); print %package->{zips};

You can dereference a hash (not a hash reference) with the dereferencing arrow, but if you enable warnings, you will receive the message:

Using a hash as a reference is deprecated...

A Selection of Deprecated Features

If you browse perldiag, you'll find several other deprecated features. Some of them are recent. Many of them have remained deprecated for several years. (Though it's tempting to put the Perl 4 pseudo-package separator in this category, it's more accurate to say that the Perl 5 syntax has superseded it for fifteen years.)

You won't get all of them; you have to dive into the source for all of them. Yet here are a few of my favorites.

  • Really old Perl let you omit the @ on array names in some spots. This is now heavily deprecated.

    my @nums = 1 .. 4;
    push nums, 5;

    This throws a compilation error about an undeclared nums when run with strict, but the code gets through the parser.

  • Really old Perl let you omit the % on hash names in some spots. This is now heavily deprecated.

    This is similar; you can convince keys to operate on a hash if you omit the sigil as well.

  • You have used the attributes pragam to modify the locked attribute on a code reference. The :locked attribute is obsolete, has had no effect since 5005 threads were removed, and will be removed in the next major release of Perl 5.

    This message has been around for a while.

  • You have used the attributes pragam to modify the unique attribute on an array, hash or scalar reference. The :unique attribute has had no effect since Perl 5.8.8, and will be removed in the next major release of Perl 5.

    After 5.10.1, it should be safe to remove both of these attributes from bleadperl (what will become 5.12).

  • defined() is not usually useful on arrays because it checks for an undefined scalar value. If you want to see if the array is empty, just use if (@array) { # not empty }.

    This message has been around for a while. It might be more useful as a parser syntax error.

  • defined() is not usually useful on hashes because it checks for an undefined scalar value. If you want to see if the hash is empty, just use if (%hash) { # not empty } for example.

    This message has been around for a while.

  • You used a declaration similar to my $x if 0. There has been a long-standing bug in Perl that causes a lexical variable not to be cleared at scope exit when its declaration includes a false conditional. Some people have exploited this bug to achieve a kind of static variable. Since we intend to fix this bug, we don't want people relying on this behavior.

    This is a newer deprecation; the state keyword added in Perl 5.10 provides a better way to declare and use static lexical variables in Perl 5.

  • The $[ variable (index of the first element in an array) is deprecated.

    I believe this is on the removal schedule for Perl 5.12.

  • Use of implicit split to @_ is deprecated

    It makes a lot of work for the compiler when you clobber a subroutine's argument list, so it's better if you assign the results of a split() explicitly to an array (or list).

    Michael Schwern posted a patch to Remove implicit split to @_. As his message points out, this is another Perl 4 feature deprecated with the release of Perl 5.000 in October 1994.

    What's the problem? The use of split in scalar context. When you use split that way, Perl 5 clobbers @_ with the list of results.

    This patch generated a lot of discussion.

  • You used the package keyword without specifying a package name. So no namespace is current at all. Using this can cause many otherwise reasonable constructs to fail in baffling ways. use strict; instead.

    This message baffles me, but apparently you can get around declaring variables in some cases if you write package;.

  • You tried to use a hash as a reference, as in %foo->{"bar"} or %$ref->{"hello"}. Versions of perl <= 5.6.1 used to allow this syntax, but shouldn't have. It is now deprecated, and will be removed in a future version.

    I like how the code examples quote hash keys. This is a newer deprecation notice; it's only been around for eight years.

  • You tried to use an array as a reference, as in @foo->[23] or @$ref->[99]. Versions of perl <= 5.6.1 used to allow this syntax, but shouldn't have. It is now deprecated, and will be removed in a future version.

    This has also been around for eight years.

The Deprecation Argument (and a possible solution)

The arguments for removing these deprecated features (with the appropriate deprecation period) are simple: they simplify Perl 5's internals, they remove confusing syntax corner cases, and they encourage people to write better code.

The arguments for retaining these features are likewise simple: modifying code may cause bugs and removing features may break existing code.

I have less sympathy for the con side; I find its arguments unconvincing. The arguments are especially thin as deprecating a feature means adding a warning for it. This means not only modifying code, but changing its behavior.

The problem, I believe, is that there's little impetus to migrate away from deprecated features; features can remain deprecated for 15 years. As well, there's too little feedback on the effect of removing deprecated features on existing code.

I ponder the existence of an alternate Perl 5 binary with deprecated features removed; would DarkPAN developers run it against their test suites and report any results where modifications are onerous? Would that provide sufficient data as to the effects of removing these features?

$VERSION Confusion

| 2 Comments

What's the right way to declare version numbers in your modules?

One obvious approach is:

use vars '$VERSION';
$VERSION = 1.0.1;

If you're using Perl 5.6.x or newer, you might instead write:

our $VERSION = 1.0.1;

That's obvious, but wrong; version specifiers aren't numbers. To work correctly, you must quote them:

use vars '$VERSION';
$VERSION = '1.0.1';
...
# alternately
our $VERSION = '1.0.1';
# or
$PACKAGE::VERSION = '1.0.1';

What the Documentation Said

If you look in the ExtUtils::MakeMaker documentation under the VERSION_FROM section, you'll see a special case heuristic the Perl 5 toolchain uses to determine the version number of a given module. To the best of my current understanding, this is only necessary if you haven't already specified the version number in a declarative form -- in a META.yml file generated by ExtUtils::MakeMaker or Module::Build, for example.

MakeMaker's particular magic uses a regular expression to match a single line in the file. Thus you must abut the vars version:

use vars '$VERSION'; $VERSION = '1.0.1';

If you've worked with Perl for a while, you may have encountered v-strings, a feature intended to encapsulate the differences between numbers, strings, and version numbers:

our $VERSION = v1.0.1;

However, the version documentation recommends against their use: their meaning has changed between major versions of Perl 5, they were only reliable with three-part version numbers.

Careful hackers may notice that the MakeMaker documentation never suggested using three-part version numbers. (The MakeMaker maintainer called me out on this in a discussion.) Careful documentation readers may recall the advice in Guidelines for Module Creation in perlmodlib:

To be fully compatible with the Exporter and MakeMaker modules you should store your module's version number in a non-my package variable called $VERSION. This should be a floating point number with at least two digits after the decimal (i.e., hundredths, e.g, $VERSION = "0.01" ). Don't use a "1.3.2" style version. See Exporter for details.

The Exporter documentation says:

The Exporter module will convert an attempt to import a number from a module into a call to $module_name->require_version($value). This can be used to validate that the version of the module being used is greater than or equal to the required version.

The Exporter module supplies a default require_version method which checks the value of $VERSION in the exporting module.

Since the default require_version method treats the $VERSION number as a simple numeric value it will regard version 1.10 as lower than 1.9. For this reason it is strongly recommended that you use numbers with at least two decimal places, e.g., 1.09.

Easy as pie, right?

Don't use version strings. Quote version numbers. Don't think of them as numbers; they don't follow the same rules as numbers (1.1 is different from 1.10). Write them all on one line. Above all, be consistent with everyone else's mixing and matching of all of the different ways to declare and consume version numbers.

What the Code Did

Don't fret; it's easy to get things wrong. Given a package Foo:

package Foo;

our $VERSION = '1.0.1';

1;

use Foo '1.0.2'; silently succeeds with Perl 5.10 and bleadperl.use Foo 1.0.2; is an error. So is use Foo v1.0.2;

If you change the version in Foo to 1.23.0, use Foo 1.23, both Perl 5.10 and bleadperl will fail with the error Foo version 1.23 required--this is only version 1.23.0.

Here's another fun one. What will this produce?

{
    package Bar;
    our $VERSION = v72.69.76.80;
}

package main;

say $Bar::VERSION;
say Bar->VERSION;

You get partial credit for guessing that the method call will produce v72.69.76.80. You get a week's supply of analgesic for guessing that the variable stringifies to HELP.

The version module exists to ameliorate some of this pain. The recommended approach to solving this madness is the one-liner:

use version; our $VERSION = version->declare("v1.2.3");

... though you can use the slightly shorter version:

use version; our $VERSION = qv("v1.2.3");

Note that this only helps on the declaration side.

Of course, if you've (wisely) taken the advice to eschew everything with the stench of v-strings, you can use the slightly-less recommended approach:

use version; our $VERSION = version->parse("1.02");

Yes, the quotes are necessary.

Of course, the version documentation suggests that you don't need to use the module if you use a simple dotted-integer version number with a single decimal place:

our $VERSION = 1.02;

Note that Perl 5 version numbers do not follow this suggestion. (Note that I explicitly avoided talking about alpha versions and the various heuristics and denotations thereof.)

What Could Be

In a thread on version 0.77, David Golden suggested an alternate, declarative package version syntax:

package Foo::Bar 1.00;

In an unrelated thread, Nicholas Clark gave a rule of thumb for borrowing syntax from Perl 6 for corresponding features in Perl 5:

The argument in favour of adding a := operator to give the := syntax is that Perl 6 is using that syntax, so it will become familiar, and that it's better to converge than diverge.

The Versioning section in the Perl 6 modules synopsis discusses many issues of versioning before offering a declarative syntax:

class Pooch:name<Dog>:auth<cpan:JRANDOM>:ver<1.2.1>

Thus I wonder if David Golden's suggestion might provide a nice way out of this mess:

package Foo::Bar :ver(1.00);

This has the advantage of being declarative, so it's much easier to parse (if EUMM and other utilities still need to parse it). It's much less code (and fewer expressions to write, understand, and maintain) than the package global variable version. Its effects can occur at compilation time. It can't possibly break existing code because it was impossible to overload the package keyword (at least without Devel::Declare magic or a source filter).

There are two downsides. It's only available to new code running a patched version of Perl 5. This is likely only corehackers Perl 5 for the forseeable future. (Update: This is for two important reasons. First, it's experimental code that may not ever work out. Second, Dave Mitchell asked that p5p concentrate on releasing Perl 5.10.1 for the near future, and I don't want to distract from that with what could be a bikeshed discussion.). It also needs to exist in parallel with other version declaration methods until and unless they switch over to the nicer version.

Given the confusion surrounding package version declarations, having one clearly obvious way to do things seems like a big improvement.

(Schwern suggested also patching the documentation for consistency, but even after writing all of this, I'm still unsure of where to start and what to say.)

Expressing Visions for Perl 5

| 5 Comments

In a corehackers discussion yesterday, David Golden suggested that collecting, reconciling, and unifying a vision for Perl 5 may lead to improvements in development processes, scheduling, and prioritization.

I know of no better description than James Shore's summary of Vision from The Art of Agile Development:

Every project needs a single vision....

Distance between visionaries and the product manager increases error and waste.

Some projects have multiple visionaries. They need to combine their ideas into a unified vision.

A vision for your project helps you identify success. It expresses what you believe to be important. It allows you to measure your completion of your success conditions. Vision defines what you're building, why you're building it, and when you've finished.

I've worked on several projects without vision. They've all failed to produce anything useful.

Vision in Practice

Vision affects every technical and philosophical decision made about the project. Which features are important? Which features can wait? Does one group of users deserve prioritization over another? Are there long term implications of decisions? What kind of organization works best?

Community-driven software projects with strong leadership often have strong visions. For example, David Wheeler's PostgreSQL Development: Lessons for Perl? expresses one element of the PostgreSQL core team's vision:

[The] PostgreSQL project can be explicit about what versions of PostgreSQL it maintains (in terms of regular releases with bug fixes and security patches) and can quickly deliver new releases of those versions. Because so little changes in maintenance branches other than demonstrable bug fixes, there is little concern over breaking people's installations.

PostgreSQL makes a priority of keeping old installations running. This leads to development practices:

[Every] time a major new version of PostgreSQL ships, a maintenance branch is forked for it; and thereafter only bug fixes and security issues are committed to it. Nothing else.

I write this not to contrast with any other project's vision or development practices; I write this only to demonstrate a strong connection between a well-defined vision and well-enforced development practices.

Vision for Perl 5

David also suggested that a discussion of motivations and goals may help the Perl 5 community converge on a shared vision. Here's my vision for Perl 5.

I would like Perl 5 to:

  • Increase in usability and consistency, especially with regard to metaprogramming
  • Change its default behavior to improve learnability for novices and to encourage writing maintainable, elegant code
  • Become easier to modify and extend
  • Allow further optimizations available in other dynamic languages (JIT, better garbage collection, automatic parallelization, serializable bytecode)
  • Absorb widely-used and well-designed features tested from the CPAN, perl5i, and corehackers

You could summarize this into a simple phrase: "I want Perl 5 to become a better language for new development."

Now the fun begins: what's your vision for Perl 5 development? Feel free to post a comment here. (Be aware that I will unapprove any comment criticizing another vision; visions are positive statements of what you personally value. Express what you value instead.)

Advocates of strict typing systems often tell the reliability bedtime story when explaining why looser type systems give them the howling fantods. They're not entirely wrong, either.

Quick, what's the problem with this Perl code?

no strict 'refs';

*{ $classname . '::' . $subname } = sub { ... };

Don't worry if you don't get it immediately. It's subtle: within the body of the anonymous sub, you have to enable strict reference checking or Perl will silently ignore any symbolic references.

Maybe that's not a problem for the person who first wrote that code. He or she knew exactly what to write and made no typos. Great! I wish I were that careful and fortunate.

I can think of very few reliable, maintained Perl 5 programs which don't have strict enabled in the broadest scope. Our community practices and our tooling and our default idioms (whether enabled by language or convention) shape the way we think and behave.

Now imagine what might happen six months later when someone else (or the original developer) needs to make a change to that code. Would you remember that strict 'refs' is off in that code? (I believe I owe credit to Joshua ben Jore for first mentioning this problem; it's an example of a poorly recognized problem from the Tower of Defaults.)

The solution in this case is to ensure that loosening strictures occurs in the smallest sufficient possible scope. A similar principle exists for both physical and virtual security. A better version of that code might be:

my $subref = sub { ... };
{
    no strict 'refs';
    *{ $classname . '::' . $subname } = $subref;
}

Not only does the tight scoping limit the effects of disabling strict references, it gives visual cues to maintainers that something different is happening. Proceed with mindfulness. This is even a user interface principle: make exceptions obvious.

Rafael believes that strictperl is a broken idea, so immediately obviously bad that it's suitable only for a source filter. One of his gripes is that it does not run the vars pragma nor Exporter unmodified.

He's right. It unrepentently does not.

Rafael says that the "very purpose [of vars and Exporter] is to manipulate symbols by name, which is exactly the kind of thing that strict prevents." He's right about that too.

Follow those links, though. Look at the implementations of Exporter and vars. Count the lines of each that absolutely cannot run with all strictures enabled. Count the remaining lines.

Done? Great. Now tell me with a straight face that foundational core libraries that have been in Perl 5 for fifteen years are paranoid enough.

These modules do not have to break under strictperl. They could run unchanged (as far as everything else which uses them notices). There are no backwards-incompatible changes by modifying them to disable strictures in the smallest necessary scopes.

If strict is useful enough that you use it in all programs you expect to maintain, if you believe it protects you against problems you didn't expect, even if you have a copious test suite, tell me that it's not worth even asking if it's useful for the more than 90% of code in vars and Exporter (just as examples) that does not need to disable strictures.

You may believe that this is a silly, dumb, useless experiment I've wasted my time writing C and portable Make rules when I could have written a frivolous little source filter. You may be right. Feel free to ignore it, mock it, file bug reports about it, whatever you like.

Yet I believe this stupid little experiment may be useful -- not just for people who want to practice exception-based strictures (strict by default: exceptions as necessary) but also for people who would like to make future problems even less likely to occur, especially in Perl 5's core library itself. (Isn't that one great way to measure maintainability?)

Limiting the scope of loosened strictures is by no means the only way to improve the reliability and maintainability of serious programs, but why not take advantage of the possibility to do so?

strictperl

| 6 Comments

As I mentioned in Why corehackers Matters, the ability to fork and modify your own version of bleadperl -- and perhaps get it merged back into Chip's staging tree -- opens a lot of room for experimentation.

I alluded to a minor feature branch I've worked on for a couple of days: unilaterally enabling strict for all code not run through -e. This is available from my strict_by_default bleadperl tree on GitHub. You're welcome to download it, play with it, fork it, submit patches, or do whatever you want.

If Perl is a Shinto shrine, forking is an act of love... provided there's a merge sometime in the future.

Playing with strictperl

To build strictperl, first clone my bleadperl tree from GitHub. Check out the strict_by_default branch:

$ git clone git://github.com/chromatic/perl.git
$ cd perl
$ git checkout origin/strict_by_default

Then configure and build Perl as normal:

$ sh Configure -de -Dusedevel
$ make

This will build the familiar perl binary. Now build strictperl:

$ make strictperl

This will build a separate binary named strictperl. If I've written the code (and especially the Makefile rules) correctly, these will be two completely separate binaries with different behaviors:

$ ./perl       -e 'print $c' # no error
$ ./strictperl -e 'print $c' # no error

$ echo 'print $c' > printc.pl
$ ./perl       printc.pl  # no error
$ ./strictperl printc.pl
Global symbol "$c" requires explicit package name at printc.pl line 1.
Execution of printc.pl aborted due to compilation errors.

You can use strictperl in place of regular perl any place you like... except that several core modules are not strict safe. In particular, Exporter and vars are the first two problematic core libraries.

Similarly, any module which does not use strict may have strictness errors when running under strictperl.

I don't think that's a bad thing, however; think of it as an opportunity to make lots of code strict safe even if it doesn't use strict right now. (You could argue "Why in the world would you ever want to touch all of that code for no benefit?" You can also argue why you'd want to make your C code lint-safe, or run Perl::Critic on a codebase.) These "errors" may not be errors in practice, but if we evaluate them all, we can note declaratively in our source code that we've considered each one carefully and avoid further potential maintenance problems. Right now strictperl is an experiment and a tool to help us identify these situations.

Patches and pull requests very welcome to help patch up the core modules for strict safety.

How it Works

strictperl works by changing the default hintset of nextstate nodes in the Perl 5 optree.

Don't be scared. The implementation is slightly ugly (thanks to the way strict itself works), but it's much less invasive or difficult than rewriting optrees as something like Devel::Declare must do.

If you look in the strict pragma, you'll see several auspicious lines:

my %bitmask = (
    refs => 0x00000002,
    subs => 0x00000200,
    vars => 0x00000400
);

# ...

sub import {
    shift;
    $^H |= @_ ? bits(@_) : $default_bits;
}

This code ORs together a bitmask of all of the strict features you've requested and toggles them on in the magic $^H pseudo global variable. These constants correspond to three constants #defined in perl.h:

#define HINT_STRICT_REFS    0x00000002 /* strict pragma */
/* ... */
#define HINT_STRICT_SUBS    0x00000200 /* strict pragma */
#define HINT_STRICT_VARS    0x00000400 /* strict pragma */

These hints are part of a particular type of node in the optree called a COP (control op, I presume). These are always nodes of type nextstate; you see them often when you use B::Concise, for example:

$ perl -MO=Concise
print "Hello, world!"
6  <@> leave[1 ref] vKP/REFC ->(end)
1     <0> enter ->2
2     <;> nextstate(main 1 -:1) v:{ ->3
5     <@> print vK ->6
3        <0> pushmark s ->4
4        <$> const[PV "Hello, world!"] s ->5
- syntax OK

Each COP contains information about the package and line number of the Perl code the next ops represent, as well as hint information such as which strict pragma features are in effect. (They contain more information as well.)

When you modify the hints through $^H, you modify the flags in the previously-executed nextstate op. (If you're very curious, see the cop_hints member of the cop struct in cop.h.

There's a complicating factor. nextstate hints nest in a similar way that lexical scopes nest. If you enable strict in an outer scope, its effect remains in place in inner scopes unless they explicitly disable it.

That's actually fortunate, in this case.

I knew that enabling strict meant setting the appropriate hints flags when building COP nodes in the optree. That meant modifying Perl's parser. My original approach was to modify the function used to create new COP nodes, a function named newSTATEOP. That's where I discovered the pseudo-inheritance scheme which allows strict nesting. (I admit that I don't understand all of its implications).

After a couple of blind alleys, I realized that the only way to enable strict pervasively was to find the creation point of the parentmost COP node in the optree and set these hint flags there.

Perl 5 uses a top-down parser; it starts by matching the most general rule and descending into subrules to try to build a whole program. The topmost rule is prog; a program matches the progstart and lineseq rules. progstart is simple:

progstart:  { PL_parser->expect = XSTATE; $$ = block_start(TRUE); };

You can ignore the contents of this rule. The important point is that this is the first rule matched in a program -- a file, actually.

There's one more piece of the puzzle. If you look in the implementation of the newSTATEOP function, you'll see that it uses a globalish (interpreter-local, anyhow) variable PL_hints to set the hints flags on the newly-created COP:

    CopHINTS_set(cop, PL_hints);

Thus my patch is very simple; progstart now reads:

progstart:
        {
            PL_hints |= PL_e_script ? DEFAULT_CLI_HINTS : DEFAULT_PROGRAM_HINTS;
            PL_parser->expect = XSTATE; $$ = block_start(TRUE);
        }
    ;

PL_e_script is another interpreter-local variable which contains the text of code run with -e. It's empty unless the invoking command line used the -e flag. DEFAULT_CLI_HINTS and DEFAULT_PROGRAM_HINTS are new constants I added to perl.h:

/* which hints are in $^H by default */
#define DEFAULT_CLI_HINTS 0
#ifdef STRICTPERL
#   define DEFAULT_PROGRAM_HINTS \
               HINT_STRICT_REFS | HINT_STRICT_VARS | HINT_STRICT_SUBS
#else
#   define DEFAULT_PROGRAM_HINTS 0
#endif

I made them conditional on the STRICTPERL symbol for one specific reason: the compilation rules I added to the Makefile to build strictperl define -DSTRICTPERL and rebuild the Perl 5 parser. Thus the DEFAULT_PROGRAM_HINTS constant enables all strictures only when building strictperl.

(Yes, cautious Makefile hackers, those rules clean up after themselves so that the relevant files always get rebuilt when building strictperl and get removed after building strictperl so that any subsequent non-strictperl builds do not use object files with the wrong constants defined.)

The hardest part of this whole process was getting the Makefile rules right. I'm not quite sure they're cross-platform enough, but they work with my testing.

The Value of strictperl

Was this process worthwhile? It was entertaining. It gave me the chance to write code to implement a feature I believe is worth considering. It helped me understand the optree in a bit more detail. It gave me a good opportunity to explain some of that here.

Perhaps the best result of this process is that we now do have a Perl with strictures enabled by default. We can experiment with that to see how writing code works in this case. Admittedly there's a lot of work necessary to make core libraries play nicely with strictperl, but we can do that in pieces because this is an optional feature you have to enable by default, one which does not interfere with regular perl.

Those are the kinds of experiments I want to encourage.

Why Corehackers Matters

| 2 Comments

At YAPC::NA 2009, Chip and sungo announced the corehackers project.

I believe this project is important to the future of Perl 5. It has several parts:

  • To encourage new developers to work on the Perl 5 core (internals, ecosystem, and libraries).
  • To improve documentation and processes to make improvements to the Perl 5 core.
  • To experiment with new features and design decisions.
  • To clean up the internals.
  • To mentor new and recurring contributors.
  • To take some of the burden off of the overworked, underappreciated pumpkings.

I see corehackers as the other side of the important perl5i project; important features perl5i identifies and stabilizes can come into the corehackers tree (or forest, I suppose) for further experimentation without destabilizing the main development trees for Perl 5.

This is a huge benefit of Perl 5's switch to Git last year, and I'm glad to see Perl 5 development take advantage of it.

One of the obvious inspirations is the Linux Kernel Janitors project, which has performed similar functions for the Linux kernel.

The project is in its early stages. You can fork Chip's own bleadperl repository on GitHub and experiment with making changes. The IRC channel is active (if a little bit scattered; it's not clear exactly what's happening when, so you have a chance to guide it to your desires and expectations), and there are plenty of people interested in experimenting with all sorts of pent-up ideas.

In particular, I'm most interested in cleaning up some of the internals to make them more readable, more optimizable, and more amenable to further extensions and enhancements such as compiling Perl 5 to LLVM. A 30% performance improvement is likely, with few changes.

I've personally forked an experimental Perl which enables strict unilaterally, unless running with the -e flag, without having to use any module. I'll write more about that soon.

The Tower of Defaults

In Take Advantage of Modern Perl I digressed to discuss a hierarchy of defaults. I use this concept to help make design decisions about where to fix problems. (I admit to borrowing concepts from Joshua Bloch's How to Design a Good API and Why it Matters (PDF link)).

As you progress from the top of the tower to the base, you solve the problem for more people.

  • Unknown Problem
  • Poorly Recognized Problem
  • Infrequently Asked Question
  • Frequently Asked Question
  • Frequently Reinvented Module
  • Available Module
  • Community Idiom
  • Popular Module
  • Bundled Module
  • Core Module
  • Core Language

Please note that this post does not address the difficulty of fixing a problem at any one of these locations, nor does it consider the costs of compatibility, testing, or maintenance. Those are important concerns, but they're not the important point for this post.

Institutional Knowledge Locations

The least effective way to address a problem is by ignoring it. Of course, that assumes you know it's already a problem. A problem you haven't even identified is worse. A problem only a few people acknowledge is bad.

(By "good" and "bad" I refer to the efficacy of addressing the problem. Again, the severity and reach of a problem are important to consider when deciding how to fix it.)

Documentation Locations

Documenting a problem is an improvement. Moving down the tower even to institutional knowledge, where occasionally someone ask for and receives a workaround helps more people avoid the problem. Documenting the problem and any workarounds in a FAQ is even better.

External Code Locations

If users know enough to reinvent the wheel to solve the problem, this can be better than a FAQ. (It can be worse, too; some problems are difficult to solve even with domain-specific knowledge and good test cases.)

It's better to have a decent solution available, though that's generally only useful to people who know it's available.

Widespread community knowledge helps even more people, at least if they have contact with the appropriate parts of the community. There's an overlap here with FAQs and IAQs, but not all FAQs become idioms and vice versa. The ability to look at code and recognize such an idiom -- or to recognize the need for an idiom -- is useful.

The best solution in this category is a popular, well-recognized, well-understood module. Its existence does not preclude other options, but its efficacy and applicability makes it the default solution.

Internal Code Locations

A project such as perl5i or Strawberry Perl can address problems (bugs, missing features, useful enhancements) by bundling modules. They're available by default to everyone who uses the distribution. People need to know that they're available, so other parts of the tower apply, but the barrier to use is much smaller.

The core library is even more effective, provided that users know which parts exist.

As you might expect, the most effective way to address a problem is often to address it in the language itself.

Caveats

I glossed over the importance of design on purpose. I bring it up now because I don't want to give the impression that I believe that pushing all potential fixes into the language itself is either desirable or good. It's not.

An XML parser may be a good bundled module, but it's likely a mistake to put XML parsing in the language itself. A garbage collector tweak tuned for a specific processor may be sufficient as an infrequently asked question, as the people who can best take advantage of it need to analyze its benefits and drawbacks appropriately.

Yet it's also important to remember that the further up the tower a solution exists, the fewer people can take advantage of it.

Contrary to the strawman arguments set up and knocked down so easily by people who believe that breaking the DarkPAN is a sin almost as bad as apparent hypocrisy on the Internet, no one wants to change Perl 5 willy-nilly. Yet many people believe that a few carefully-planned backwards-incompatible changes will allow Perl 5 to evolve into a much more powerful, much cleaner, much more understandable, and even much simpler language suitable not only for maintaining legacy projects but also for creating amazing new projects.

I intended my deconstruction of fear surrounding Perl core language changes to continue a long-standing debate. If you read Planet Perl Iron Man or Planet Perl, you've read other people arguing over the desirability of supporting the DarkPAN.

If your code is part of the DarkPAN (and it likely isn't, as you're likely a part of the Perl community if you're reading this), you may bristle at the suggestion that some people believe that retaining core language backwards compatibility to support your code is one of the worst ideas in Perl history. (If you haven't heard that suggestion, let me repeat it: the idea that the Perl community should support hidden and proprietary code of people who aren't part of the community, code that the Perl community can't see let alone test, and code that may or may not even exist is sufficiently ridiculous on its face that it mocks itself.)

Now then.

My previous post suggested that the real problem the Perl community is trying to solve is one of risk management. If it doesn't make sense to push all of the costs to manage the risk that an upgrade might change the way your DarkPAN code behaves to the Perl community (and it doesn't make sense), who bears that cost?

You do.

Don't panic, yet. It's simple to address that risk. I didn't say it's easy. I said it's simple.

Your Responsibility as a DarkPAN Maintainer

As a DarkPAN maintainer, you need to answer several questions about your code.

  • Is this software worth maintaining?
  • Is this software under active maintenance?
  • Does this software need community support for the Perl core and any external dependencies?
  • Will this software work unchanged with a newer version of Perl?
  • If not, why not?
  • What changes are necessary to make this software work with a newer version of Perl?

Software outside of active maintenance can stay untouched. Don't worry about it. (Of course, if you have to migrate to a new machine or a new architecture or other dependencies, it's under active maintenance at least for the sake of this argument.)

The other questions are more important. You need to be able to verify that your code will perform as you expect if you change one or more dependency.

Likely you already need to know this.

You need a comprehensive test suite which can run on any likely candidate machine. You need a test environment where you can run your software's test suite against an updated dependency -- a new CPAN distribution, for example, or a new major release of Perl. You need unambiguous results and comprehensible diagnostics.

In short, you need a sane development environment, just like any serious project worth your time already has.

(You should also have a ten minute build, but if you can verify that your software does what you expect in an hour or less and report the results to the Perl community, you're well on your way.)

If you're really good, you can even test your DarkPAN code against bleadperl or maintenance snapshots just to help p5p understand your expectations (and to understand potential changes to the language which may require you to revise your expectations, if not your code).

Anyone serious about software development should have a comprehensive and useful test suite anyway. It's simple. It's not always easy; the Perl community needs to make CPAN distribution installation and versioning easier and cleaner. That debate is happening. The solution isn't obvious yet, but a process of iteration and refinement will produce it.

Yet any DarkPAN code of such low importance that its maintainers can't produce a test suite to verify that it works isn't worth p5p's time. Forget that code. It's a dead end. Let's concentrate instead on refining a language for serious software developers to write new code worth maintaining.

Fear.pm

| 3 Comments

If you read Planet Perl Iron Man (and you should) or listen to the discussions of the corehackers project, you may have seen more discussion about Perl 5's DarkPAN problem.

One of the big tensions in the Perl 5 world is between progress and stability. (I use both terms with the same sense of distaste I hear the terms "pro-choice" and "pro-life". Then again, I sympathize with linguistic prescriptivism, if only to clarify motives and intent.)

"Perl 5 must change," some people cry. "There's no good reason Perl shouldn't enable strictures and warnings by default for all new programs!"

"Perl 5 cannot change," retort others. "There's too much existing code to change Perl's behavior!"

I find the latter argument ridiculous such that withering mockery is the only good response. That's rarely useful, however.

When people say "Perl 5 cannot change its default behavior!", I believe they have in mind several other points. Some of them are good points. Yet until the Perl community as a whole can address those points directly, we'll remain at an impasse. (The word "impasse" overstates things; to a man, the active Perl 5 pumpkings appear to hew strongly to the "Change is painful and bad and wrong" philosophy, even going as far to say that frequent releases are undesirable hassles because stat calls are not cheap.)

Translation to English of Various Meanings of "Stability Über Alles"

With that in mind, here are several possible meanings of "You can't change default behavior!

  • Distributors may upgrade Perl 5 in their installations and may have to upgrade packages which depend on Perl 5 to work with the new version. This is true. This is what distributors do. This is what distributors do with all of their dependencies. This is why distributors exist. This is also only a problem if no optional mechanism of disabling new features exists -- and such a feature needs to exist.
  • Changing Perl 5's default behavior may render existing tutorials and examples obsolete. Good. Many existing examples of Perl 5 code are horrible. A steadfast refusal to run unmaintainable code may even encourage the creation of better tutorials and the publication of better examples.
  • Existing code -- left untouched for a decade -- may suddenly break.

    I don't understand this point.

    I ran into Perl 4 code the other day. Somehow the last sixteen years of Perl 5 releases have not yet managed to erase all perl4 binaries -- as well as the Perl 4 source code -- from the world's hard drives and tape drives and USB drives. Why should anyone believe that Perl 5.10.0 will not be available when Perl 5.10.1 comes out? Ditto Perl 5.12.0.

    Sometimes this argument has nuance to it. We must use a version of Perl supported by a vendor to whom we offer supplications of fresh fruits, wines, native crafts, and large checks. In other words, you're paying for the privilege of not upgrading. Good for you. Go bug the organization you're paying. That's why you're paying them.

    Sometimes this argument indicates that the arguer has no business working with computers in a professional setting. We don't know what software we're running and we won't know what will break if we upgrade and we don't know how to fix it if it does. If that's you, write your stakeholders a letter suggesting that they try to avoid upgrading, ever. Then find another line of work, perhaps something involving no technology more complex than one rock stacked atop another.

    If you can't test your software against newer dependencies, identify any potential problems, and work with upstream to resolve those issues before you perform an upgrade -- or if you're unwilling to do so -- then you are dangerously incompetent. That kind of incompetence is not the Perl community's responsibility.

    Don't short your stock, either -- that smacks of insider trading.

  • Frequent, experimental, zig-zag changes to Perl 5 syntax and semantics will be confusing! Yeah. That's why no one's suggested doing them. Suggesting that Perl 5 could use real function signatures or strictures-on-by-default is very different from throwing every potential combination of hash-and-array-sort function into one big global namespace.

    The desire to add missing features and the desire for more frequent releases by no means implies a lack of foresight or holistic design, nor a lack of comprehensive testing, nor thoughtful refinement of an idea and implementation to the point where it's obviously right.

  • Changes mean bugs, and we can't have bugs! There are already bugs. There are already regressions -- including a performance regression that would have affected only a few people if a stable 5.10.1 had come out in early 2008.

    The only way to avoid bugs altogether is to avoid writing software. The best you can do is make them unlikely, catch them early, and fix them quickly (remembering that unreleased software may as well not exist to your users).

  • It's irresponsible to break someone else's code, especially if you can't see it. It's insane to support invisible code that may not even exist.
  • There are so many competing implementations of this idea on the CPAN, it's obvious there's no one right way to do things! There are so many competing implementations of this idea on the CPAN because there's no obvious good, default, built-in way to do things.
  • My code has to run on several different major versions of Perl; I can't take advantage of these new features. You have a change management problem. Not me.
  • I can't show you a test case, but this change breaks my code! There's an invisible sign on the road by my house that gives me the right to charge a $5 toll. Pay up.
  • This is the way it's always been. How's that working out?

A More Serious Take on Stability

Change doesn't have to be painful. Change doesn't have to be chaotic. It's possible to meet many of the real underlying goals with technical means.

The problem isn't technical, however. It's social. It's fear.

This is the fear of risk -- the risk that unknown problems lurk in seemingly minor changes. This is also the fear of the risk that the cost of mitigating this risk is too high. This is especially the fear of the risk that changing Perl 5 will appear so expensive that people will stop using it.

With all of that mockery out of the way, perhaps the Perl community can have a sober assesment of risk, bereft of fears and stupid technical blatherings that serve only to obscure the real question:

Whose needs do the features and policies and strategies and goals and visions of Perl 5 development serve?

Change for the sake of change itself is useless. Stability for the sake of stability is equally useless. No one wants complete stability or complete chaos. (Even if you think you want complete stability, you don't; when you find a bug or a typo or a confusing section of documentation, you've found a place where perfect stability gets in your way.)

(You'd almost think the Perl 5 community didn't have a few experienced project managers, methodologists, risk managers, and software developers, if not several thousand people who know how to create, maintain, sustain, and release free software projects.)

There must be a middle ground. There must be a way to identify real needs, prioritize useful changes, and deliver those changes to stakeholders in an efficient and effective fashion. I reject the false dilemmas which state that we have to make a choice between relentless, plodding conservatism and Psychotic Hyperactive Purposless-esque change...

... especially when the alternative is to suggest that every file containing modern Perl code start with a wall of boilerplate:

use 5.010;

use strict;
use warnings;

use utf8;
use Carp;
use Want;
use CLASS;
use SUPER;
use autodie;
use autobox;
use signatures;

use mro 'c3';

use IO::Handle;
use File::stat;
use autobox::Core;
use UNIVERSAL::ref;

I hate to channel John Stuart Mill, but if Perl 5 stays like that for long, it won't be a language suitable for novices to write new programs. It'll be merely a great language to maintain code written in the late '90s not yet replaced with something with slightly saner defaults.

That would be a pity.

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide

Categories

Pages

About this Archive

This page is an archive of entries from July 2009 listed from newest to oldest.

June 2009 is the previous archive.

August 2009 is the next archive.

Find recent content on the main index or look in the archives to find all content.


Powered by the Perl programming language

what is programming?