Steven Haryanto's Perl First World Problems #1 reminded me of something I've taken for granted lately.

You may have read my Controlling Test Parallelism with prove and Parallelism and Test Suites. I still have Test::Harness parallelism enabled by default on most of the machines where I install my own Perls. While I haven't yet filed tickets and tried to write patches for modules which need a little help to run tests in parallel, I've only found a few lately that need work. That's nice—having a module install through cpanm in five seconds is a lot better than ten seconds or more. (I like cpanm because it's fast and quiet, and part of its speed comes from not printing to the console.)

I like instant feedback.

Like Steven, I noticed quite a while that installing a custom Perl through perlbrew takes a while, but then I remembered that a lot of work went into the Perl 5 test suite to make tests run in parallel. (We did something similar with Parrot several years ago, and it changed the way I work forever.)

To run core tests in parallel, set the environment variable TEST_JOBS=n, where n depends on your computer. I use a value of 9 on a quad-core machine; in practice, that tends to keep the CPU busy while not blocking anything too long on IO. You can set it globally in your shell's configuration file or create an alias or wrapper for perlbrew.

As most of the time spent compiling and installing Perl 5 through perlbrew goes to running the test suite, this has saved me a measurable amount of time.

Avoiding The Vendor Perl Fad Diet

| 9 Comments

Here we go again.

It looks like Red Hat is distributing Perl without the core library ExtUtils::MakeMaker. If you're not familiar with the details of the Perl 5 build chain, all you need to know is this: without MakeMaker, you're not installing anything from the CPAN.

Ostensibly Red Hat and other OS distribution vendors split up Perl 5 into separate packages to save room on installation media. Core Perl 5 is large and includes many, many things that not everyone uses all the time... but the obvious reaction to defining a core subset of Perl 5 that a vendor can call "perl" is another of those recurring discussions which never quite goes anywhere.

For example, who needs the documentation just to run code? (Except that the diagnostics pragma relies on the existence of perldiag.pod to run.) Who needs the huge Unicode encoding tables for ideographic languages such as you might find in Japan, China, Korea, and other Asian locals? (Answer: Asia.) Who needs the ability to install code from the CPAN? (Answer: users.)

While there's a lot of stuff in the core that probably doesn't need to be in the core, or at least installed by default (a LaTex formatter for POD, the deprecated Switch module, Perl 5.005 Thread emulation), one thing is both clear and almost never said.

I'll give you a moment to think about it.

Here's a hint: you're usually better off compiling and installing your own Perl 5 under your complete control such that you can compile in options you want (64-bit integers, for example) and out options you don't (threading imposes a 15% performance penalty even in the single-threaded case) and so that you can manage your own library paths without changing the behavior of the system). perlbrew changes the game. Learn it, like it, love it.

The perpetual discussion misses one important point:

The vendor perl—especially on installation media—is not for general purpose Perl programming. It's there only to support basic administrative programs provided with the system as a whole. That's why you don't replace the system Perl. That's why you don't mess with the system CPAN modules. That's why you fence off whatever's in /usr/bin/perl like it's Yucca Mountain and you're stuck with a '50s reactor design instead of something safe and clean.

Vendors can tune and tweak that Perl to their satisfaction to provide just what they need to install and configure a working system. They can keep it as crufty and out of date as they like. When it breaks, they get to keep all of the pieces and sew them back together like some sort of Fedorastein's monster. They just can't let it out of the lab.

This of course means that they need to provide packages of Perl 5 Actual for users and developers such that it's the full core of Perl 5. (It'd be nice if they called not-a-perl as such, but one thing at a time.)

You can't predict what users will and won't do. That's why you code defensively. The moment distributions started carving up Perl to install just the little bits they needed in the hopes that their guesses as to what users wanted were right, they put everyone in a bind.

Certainly Perl 5 could benefit from a thorough review of what's in core and why, but I suspect that even if p5p came up with packaging guidelines for all of the imaginable use cases and combinations of distributor needs and user wants, it still wouldn't solve the real problem.

(Credit Allison Randal for pointing out the real problem years ago. We've discussed several times the idea of a stripped-down VM for a real language—something with better abstraction and reuse than Bash—with easy access to libraries and a very small footprint, but it's a bigger job than either of us could accomplish. It's still a righter approach than bowdlerizing an upstream distribution.)

I promised in Testing Your Templates to explain how to solve the problem of the divergence between testable, debuggable code in your host language and a big wad of logic in a template language.

This problem is an example of the pattern of Why Writing Your Own DSL is More Difficult Than You Think. Certainly Template Toolkit is among the better templating systems (I've written a couple myself), but it exhibits problems endemic to the process. (Then again, so does PHP. Now multiply that by the fact that some people use templating systems written in PHP and if you have to lie down for a while before the feeling passes, please accept my apologies.)

The semantics of Template Toolkit are great, when they work, but then everything's great when it works the way you expect. Robust software handles the cases you don't expect with aplomb, or at least without a boom.

A simple workaround for Template Toolkit is to avoid the fallback from potential method lookup to keyed hash access when dealing with an object. In other words, if $blessed_hash->do_something() fails, try $blessed_hash->{do_something}.

... except that that doesn't work when you want to call virtual methods on unblessed references, such as calling methods on arrays or hashes.

Another option is to change the syntax such that calling a method is visibly different from accessing a member of an aggregate. Perl 5 does this. It works pretty well, in the sense that if you use the right operator (access element versus invoke method), you've expressed your intent in a visually unambiguous fashion).

... except that people complain about the Perl 5 dereferencing arrow quite a bit. (Okay, you don't need an arrow to do this; as the Modern Perl book explains, the postfix indexed access or postfix keyed operators of {} and [] determine the type of operation effectively.)

... and except that one of the design goals of Template Toolkit was to be robust in the face of changing values provided to the template, such that it provides a loosely coupled interface for the data it expects. That's a fine goal, but it isn't free.

Here's the thing, though. The last time I looked, Template Toolkit compiles templates into Perl 5 code as an optimization. (The last template system I wrote did the same thing, but not as well. We should have used TT, but in our defense, TT didn't exist then.) This transliteration/compilation stage must be very, very cautious to allow standard Perl debugging and introspection tools to treat this generated code correctly. That is to say, I don't want to debug a big wad of generated code. I want to debug the code I actually wrote.

As usual, the solution is another layer of abstraction.

Perl 5 exists in two forms. The first is the source code you and I write. The second is the optree which the Perl 5 VM executes. There's nothing in between. You have one or the other. When your code runs, you have the optree, and the optree has references to the relevant location in the source code it came from, but the correspondence is often less useful than you might like.

While the generated code from Template Toolkit could include the correct file and line positions from templates, that's again less useful than you might like. (It's useful, but it doesn't solve every problem.)

If Perl 5 had instead an intermediate form separate from raw code and raw optrees, something more suitable to introspection and manipulation, we could produce tools which worked with this intermediate form to improve debugging, introspection, and better code generation.

We could even inject new code to add features (fall back to attribute access; prevent the fallback to attribute access) to code, even within lexical scopes. That is to say, we could manipulate how libraries behave from the outside in, and ensure that our changes would not leak out from our desired scopes.

It's certainly possible to replace the Perl 5 opcodes yourself, if you're comfortable reading Perl 5 source code, writing XS, relying on black magic, and dealing with strange issues of thread safety and manipulating global or at least interpreter-global values in a lexical fashion (while dealing with the fact that use is recursive in a sense)—but isn't Perl about not making people write C to do interesting things?

Certainly this isn't a technique you'd use every day, and it's not obviously a way to make Perl 5 run faster (though many optimizations become much easier), but the possibility for better abstraction and extension and correctness has much to recommend it.

And, yes, Lisp demonstrated this idea ages ago.

Imagine if Perl 5's caller returned an object which represented the call chain to the point of the object's creation.

I want to inspect the call stack within a helper module, but I don't care about the call stack within the module. I want to use lots of little helper functions, because that's good design, but caller works against that. Looking up the stack means keeping track of the magic number of call frames within my module I currently use, and that's one more thing to update when I change things.

That's structural code highly coupled to the arrangement of other code, and if that doesn't wrinkle your nose with the subtle aroma of fragility and peril, I don't know what will.

Imagine if instead:

my $caller_object = capture_caller_state();

while (my $call_frame = $caller_object->next)
{
    next if $call_frame->is_eval;
    say $call_frame->location_as_string;
    my $next_frame = $call_frame->previous;
    ...
}

Imagine if you could pass this object around.

I know things get complicated if you pass this object up the call chain, but stack unwinding is a solved problem in that anyone capable of recognizing the problem should be able to figure out cheap and easy ways to fix it.

Alas, Perl 5.14 doesn't have this feature, and it's probably too late for 5.16 to get it, so for today I'm stuck imagining what might be.

(If you've never thought about this sort of thing before, you owe it to yourself to learn more about Continuation Passing Style, which is at least an order of magnitude more mind-bending at the start and at least two orders of magnitude more useful.)

Sometimes you find bugs in the most surprising places.

I've long used test-driven design for code I care about. Knowing that it works, works reliably, and will continue to work helps me write the right thing. As I've gained experience with testing, I've come to appreciate the nuances of how and what to test and why. For example, I often don't bother to test exploratory code—it's more important for me to learn how code might work to help me figure out how to design the real thing than it is to hew to a rigid mindset of "You must always test everything."

Similarly, I see little value in rigorously testing basic accessors, and thank goodness for Moose for turning class declarations into declarative code.

In more serious discussions of testability and testing, people often argue that testing UIs is impossible, or merely difficult, or rarely worth the trouble to do so completely. I agree to some respect; the most effective way I've seen to test UIs is to test the backend data model rigorously, write a very thin controller to marshall and unmarshall requests and responses, then to write a handful of tests which explore the edge cases of UI interactions. That requires a deep understanding of what happens for each interaction, and you can only test those interactions which produce measurable side effects.

Let that latter point sink in for a moment.

My most recent work on multiple projects includes a lot of embarrassingly parallel batch processing work, where the user interface is largely static, produced daily from a series of templates. Generally the templates are correct if they pass the eyeball test with live data.

This works except when it doesn't.

Consider a snippet of Template Toolkit code to display a list of stories:

[% FOR e IN entries %]
    <div class="story story[% loop.count %]">
        <h3><a href="[% e.url %]">[% e.title %]</a></h3>
        <p>[% IF e.author %]
            [% e.author | html %]<br />
        [% END %]
        [% f.format( e.posted_time ) %]</p>
        [% IF e.has_image %]
            <img src="[% e.image.full_file %]" alt="[% e.title | html %] image" class="image" />
        [% END %]
        <p>[% e.short_content | html %] <a href="[% e.url %]">Read More</a></p>
    </div>
[% END %]

That code had a bug. Can you find it by eyeballing it? Neither could I, for a long time. I'll give you a hint: it never displayed images associated with stories. Give up?

I wrote backend test after backend test, fixing a couple of other bugs along the way, then traced through the code in a debugging batch until I had convinced myself that the data model was completely correct. The bug had to be in the template because it could be nowhere else.

Rigorous template tests didn't seem to help either, and I'd almost convinced myself that I had a nasty spooky action at a distance bug somewhere in the data model before I saw the problem:

[% IF e.has_image %]
    <img src="[% e.image.full_file %]" alt="[% e.title | html %] image" class="image" />
[% END %]

When I added a test for the has_image() method on story objects to the data model tests, it failed immediately because that method did not exist. It used to exist, but a refactoring somewhere went too far and removed that method and its tests from everything except the template. Because Template Toolkit is not so strict about the distinction between method and property access, the lack of a method is no big deal. It fell back to property access on a hash and, finding no such value in the hash, returned a false value.

There are ways to fix this. I'm sure there's a plugin for Template Toolkit, and probably a debugging/diagnostic mechanism somewhere I haven't yet discovered. Yet the problem remains.

I've heard it said (Rich Hickey? Martin Odersky? I think it was Rich.) that every bug in your program has passed the type checker. That statement contains wisdom. Yet it's also true that every bug in your program has passed the test suite. (We assume you run with type checks and keep your test suite fully passing.)

Neither is a substitute for careful thought and understanding. Both protect you only as far as you can take advantage of them.

My next post will explore some options for making these kinds of errors easier to catch way before the point of weird debugging sessions.

Find recent content on the main index or look in the archives to find all content.

Modern Perl: The Book


The best Perl Programmers read Modern Perl: The Book!

Recent Comments

  • andreasvoegele.com: I use Perl amongst other things because it is usually read more
  • chromatic: Have you ever deployed a Java application? I have. See read more
  • http://openid.aliz.es/sysop: You've made it clear in past discussions you're not really read more
  • i.galic [launchpad.net]: @szabgab.com: On Debuntu Ruby is nearly unusable. Especially in regard read more
  • randomness.org.uk: This is one of things SUN got right in Solaris. read more
  • https://me.yahoo.com/mjgardner#d9855: szabgab: Vendors often provide -devel package containing the bits needed read more
  • chromatic: As I wrote, this is why (vendors) need to provide read more
  • grantmclean.myopenid.com: I don't really have a problem with vendors carving up read more
  • szabgab.com: So what about Python, PHP and Ruby? Do developers in read more
  • Leon Timmermans: Yes, exactly. I want this, preferably yesterday. Sadly, this is read more

Recent Assets

  • butteraptor.png

Categories

Pages

OpenID accepted here Learn more about OpenID
Powered by Movable Type 4.23-en