A Practical Use for Macros in Perl

| 2 Comments

People occasionally ask for practical examples of macros when I lament the lack of macros in Perl 5. While I'm usually pleased at the degree to which Perl lets me design code to get and stay out of my way, sometimes its abstractions just aren't quite enough enough to remove all of the duplication available.

(I've been refactoring one of our business projects in preparation for another round of deployment in the next couple of weeks. We could launch without these improvements, but administrative work took almost two weeks longer than the afternoon I'd planned for it, so I decided it was worth my time to reduce technical friction so that further improvements are easier. More users means more work, so why not accelerate that work while I have the chance? I have another longer technical post to write to praise the use of Moose roles for a plugin system and to show off the stupidly-great task launcher, but that's for later.)

I found myself writing two code couplets that were similar enough they triggered my "Hey, refactor away this duplication!" alert. It's extra sensitive, because I know I'll have a few more couplets like this in the very near future:

while (my $stock = $stock_rs->next)
{
    my $pe_update = $self->analyze_pe( $stock );
    $stock_txn->add( $pe_update ) if $pe_update;

    my $cash_yield_update = $self->analyze_cash_yield( $stock );
    $analysis_txn->add( $cash_yield_update ) if $cash_yield_update;
}

The *_txn variables contain objects representing deferred and scoped SQL updates. I'll talk about that at YAPC::NA 2012 in When Wrong is Better.

The general pattern is this: for every stock in the appropriate resultset, call a method in this plugin. The method will return nothing if it fails (or has nothing to do) or it will return data to be added to the appropriate transaction. I have at least two types of transactions available here at the moment, and may have more later: one transaction updates stock data and the other updates analysis data.

I have several options. I could rework the data model so that this stage always only updates one transaction, in which the loop body could instead look like:

{
    for my $method (qw( analyze_pe analyze_cash_yield ))
    {
        next unless my $result = $self->$method( $stock );
        $txn->add( $result );
    }
}

This technique of hoisting the variants into an ad hoc data structure and using existing looping techniques works well sometimes. (I use it in other parts of the system.) It's relatively easy to expand, even though it moves interesting information ("I'm calling the analyze_pe method!") to a place where tools have more trouble finding it. (I search for >analyze_pe when I want to find method calls.) You may have used something similar to define several parametric methods at BEGIN time. It's the same type of pattern, and while Perl 5 provides most of the tools necessary to allow this, it doesn't natively express this pattern well.

I could also change the transaction object's add() method to do nothing when it receives an empty list of arguments. I like that in some ways, but I don't like it in others. I've come down on the side of keeping its invariant (it always takes only one scalar as an object) pure for now. If I change it to take a list of updates, that might be the right time to reconsider this.

What I notice in the code as it stands right now is that the individual variables $pe_update and $cash_yield_update are synthetic variables. They only exist to support the code as written; they're not necessary for the algorithm. If I were to modify this code but only this code, I'd really rather write:

{
    ADD_TXN_WITH( $self, analyze_pe,         $stock, $stock_txn    );
    ADD_TXN_WITH( $self, analyze_cash_yield, $stock, $analysis_txn );
}

... though that syntax doesn't thrill me either. The clearest possibility I see right now is:

{
    $stock_txn->add(    SKIP unless $self->analyze_pe( $stock )         );
    $analysis_txn->add( SKIP unless $self->analyze_cash_yield( $stock ) );
}

... where SKIP does some magic to move to the next statement, not the next loop iteration. (I have some ideas how to write XS to make this work, but that creepy yak needs a shave and some mouthwash.)

The second best option right now is adding a function or method as indirection to encapsulate the synthetic code. I'd rather avoid synthetic code, but at least it reduces the possibility of copy and paste bugs.

For now, with only two steps in this analysis, I'm leaving it as it is. Two repetitions of something this similar set off my refactoring alarm, but I resist the urge for refactorings this small until I see three instances of near-duplicate code.

Why I Run Tests on Install

| 4 Comments

Jonathan Swartz makes a polemic statement:

cpanm and perlbrew should not run tests by default.

His points are reasonable, but his complaints are mostly about side effects and not the real problem. (I should clarify: the real problem I encounter.)

If running tests slow down installs, speed up the tests. (Do you want to get the wrong answer faster? Easy: it's 42. No need for a quantum computer to do the calculation in constant time. This algorithm is O(0).)

If running tests exposes the fragility of the dependency chain, improve the dependency chain.

If dependency test failures prevent the installation of downstream clients... this is a weakness of the CPAN toolchain. A well-written test suite for a downstream client should reveal whether bugs or other sources of test failures in a dependency affect the correctness of the client.

Note the assumptions in that sentence.

Anyone who's experienced the flash of enlightenment that comes from working with well tested code and who's shared that new zeal with co-workers has undoubtedly heard the hoary old truism that testing cannot prove the complete absence of bugs. It's no less true for its age, though it's also true that good testing only improves our confidence in the correctness and efficacy of our code.

For me, a 95% certainty that my code works and continues to work for the things to which I've tested it is more than sufficient. I focus on testing the things I'm most likely to get wrong and the things which need to keep working correctly. (I don't care much about pixel-perfect placement, but I do care that a book's index uses the right escapes for its data and markup.)

Without tests running on the machines themselves in the environments themselves where I expect my code to run, I don't have that confidence.

Put another way, I'm either not smart enough or far too lazy to want to attempt to debug code without good tests. That's why I write tests, and that's why I run them obsessively. That's good for me as a developer, and you're getting the unvarnished developer perspective.

I also care about the perspective of mere users. (Without users, we're amusing ourselves, and I can think of better ways to amuse myself than by writing software no one uses.).

Yes, an excellent test suite can help a user help a developer debug a problem. Many (most?) CPAN authors have had the wonderful experience of receiving a bug report with a failing test case. Sometimes this even includes a code patch.

Not all users are developers of that sort, nor should they be.

The CPAN ecosystem has improved greatly at automated testing and dependency tracking, but we can improve further. What if we could identify the severity of test failures? (We have TODO and SKIP, but they don't convey semantic meaning.) What if we could identify buggy or fragile tests? (My current favorite is XML::Feed tests versus DateTime::Format::Atom because it catches me far too often, it doesn't affect the operation of the code, and it's a stupid fix that's lingered for a few months.) What if the failures are transient (Mechanize relying on your ISP not ruining DNS lookups for you) or specific to your environment (a test suite written without parallelism in mind).

As Jonathan rightly implies, how do you expect an end-user to understand or care about or debug those things?

I'm still reluctant to agree that disabling tests for end-user installations is the right solution. I want to know about failures in the wild wider world. I want that confidence, but I can't bring myself to trade away that confidence for the sake of a little more speed of installation.

Yet his point about lingering points of fragility in the ecosystem are true and important, even if the proposed solution of skipping tests isn't right. Fortunately, improving dependency management and tracking and use and testing can help solve both issues: perhaps to the point where we can run only those tests users most care about and identify and report material failures in dependencies.

Steven Haryanto's Perl First World Problems #1 reminded me of something I've taken for granted lately.

You may have read my Controlling Test Parallelism with prove and Parallelism and Test Suites. I still have Test::Harness parallelism enabled by default on most of the machines where I install my own Perls. While I haven't yet filed tickets and tried to write patches for modules which need a little help to run tests in parallel, I've only found a few lately that need work. That's nice—having a module install through cpanm in five seconds is a lot better than ten seconds or more. (I like cpanm because it's fast and quiet, and part of its speed comes from not printing to the console.)

I like instant feedback.

Like Steven, I noticed quite a while that installing a custom Perl through perlbrew takes a while, but then I remembered that a lot of work went into the Perl 5 test suite to make tests run in parallel. (We did something similar with Parrot several years ago, and it changed the way I work forever.)

To run core tests in parallel, set the environment variable TEST_JOBS=n, where n depends on your computer. I use a value of 9 on a quad-core machine; in practice, that tends to keep the CPU busy while not blocking anything too long on IO. You can set it globally in your shell's configuration file or create an alias or wrapper for perlbrew.

As most of the time spent compiling and installing Perl 5 through perlbrew goes to running the test suite, this has saved me a measurable amount of time.

Avoiding The Vendor Perl Fad Diet

| 11 Comments

Here we go again.

It looks like Red Hat is distributing Perl without the core library ExtUtils::MakeMaker. If you're not familiar with the details of the Perl 5 build chain, all you need to know is this: without MakeMaker, you're not installing anything from the CPAN.

Ostensibly Red Hat and other OS distribution vendors split up Perl 5 into separate packages to save room on installation media. Core Perl 5 is large and includes many, many things that not everyone uses all the time... but the obvious reaction to defining a core subset of Perl 5 that a vendor can call "perl" is another of those recurring discussions which never quite goes anywhere.

For example, who needs the documentation just to run code? (Except that the diagnostics pragma relies on the existence of perldiag.pod to run.) Who needs the huge Unicode encoding tables for ideographic languages such as you might find in Japan, China, Korea, and other Asian locals? (Answer: Asia.) Who needs the ability to install code from the CPAN? (Answer: users.)

While there's a lot of stuff in the core that probably doesn't need to be in the core, or at least installed by default (a LaTex formatter for POD, the deprecated Switch module, Perl 5.005 Thread emulation), one thing is both clear and almost never said.

I'll give you a moment to think about it.

Here's a hint: you're usually better off compiling and installing your own Perl 5 under your complete control such that you can compile in options you want (64-bit integers, for example) and out options you don't (threading imposes a 15% performance penalty even in the single-threaded case) and so that you can manage your own library paths without changing the behavior of the system). perlbrew changes the game. Learn it, like it, love it.

The perpetual discussion misses one important point:

The vendor perl—especially on installation media—is not for general purpose Perl programming. It's there only to support basic administrative programs provided with the system as a whole. That's why you don't replace the system Perl. That's why you don't mess with the system CPAN modules. That's why you fence off whatever's in /usr/bin/perl like it's Yucca Mountain and you're stuck with a '50s reactor design instead of something safe and clean.

Vendors can tune and tweak that Perl to their satisfaction to provide just what they need to install and configure a working system. They can keep it as crufty and out of date as they like. When it breaks, they get to keep all of the pieces and sew them back together like some sort of Fedorastein's monster. They just can't let it out of the lab.

This of course means that they need to provide packages of Perl 5 Actual for users and developers such that it's the full core of Perl 5. (It'd be nice if they called not-a-perl as such, but one thing at a time.)

You can't predict what users will and won't do. That's why you code defensively. The moment distributions started carving up Perl to install just the little bits they needed in the hopes that their guesses as to what users wanted were right, they put everyone in a bind.

Certainly Perl 5 could benefit from a thorough review of what's in core and why, but I suspect that even if p5p came up with packaging guidelines for all of the imaginable use cases and combinations of distributor needs and user wants, it still wouldn't solve the real problem.

(Credit Allison Randal for pointing out the real problem years ago. We've discussed several times the idea of a stripped-down VM for a real language—something with better abstraction and reuse than Bash—with easy access to libraries and a very small footprint, but it's a bigger job than either of us could accomplish. It's still a righter approach than bowdlerizing an upstream distribution.)

I promised in Testing Your Templates to explain how to solve the problem of the divergence between testable, debuggable code in your host language and a big wad of logic in a template language.

This problem is an example of the pattern of Why Writing Your Own DSL is More Difficult Than You Think. Certainly Template Toolkit is among the better templating systems (I've written a couple myself), but it exhibits problems endemic to the process. (Then again, so does PHP. Now multiply that by the fact that some people use templating systems written in PHP and if you have to lie down for a while before the feeling passes, please accept my apologies.)

The semantics of Template Toolkit are great, when they work, but then everything's great when it works the way you expect. Robust software handles the cases you don't expect with aplomb, or at least without a boom.

A simple workaround for Template Toolkit is to avoid the fallback from potential method lookup to keyed hash access when dealing with an object. In other words, if $blessed_hash->do_something() fails, try $blessed_hash->{do_something}.

... except that that doesn't work when you want to call virtual methods on unblessed references, such as calling methods on arrays or hashes.

Another option is to change the syntax such that calling a method is visibly different from accessing a member of an aggregate. Perl 5 does this. It works pretty well, in the sense that if you use the right operator (access element versus invoke method), you've expressed your intent in a visually unambiguous fashion).

... except that people complain about the Perl 5 dereferencing arrow quite a bit. (Okay, you don't need an arrow to do this; as the Modern Perl book explains, the postfix indexed access or postfix keyed operators of {} and [] determine the type of operation effectively.)

... and except that one of the design goals of Template Toolkit was to be robust in the face of changing values provided to the template, such that it provides a loosely coupled interface for the data it expects. That's a fine goal, but it isn't free.

Here's the thing, though. The last time I looked, Template Toolkit compiles templates into Perl 5 code as an optimization. (The last template system I wrote did the same thing, but not as well. We should have used TT, but in our defense, TT didn't exist then.) This transliteration/compilation stage must be very, very cautious to allow standard Perl debugging and introspection tools to treat this generated code correctly. That is to say, I don't want to debug a big wad of generated code. I want to debug the code I actually wrote.

As usual, the solution is another layer of abstraction.

Perl 5 exists in two forms. The first is the source code you and I write. The second is the optree which the Perl 5 VM executes. There's nothing in between. You have one or the other. When your code runs, you have the optree, and the optree has references to the relevant location in the source code it came from, but the correspondence is often less useful than you might like.

While the generated code from Template Toolkit could include the correct file and line positions from templates, that's again less useful than you might like. (It's useful, but it doesn't solve every problem.)

If Perl 5 had instead an intermediate form separate from raw code and raw optrees, something more suitable to introspection and manipulation, we could produce tools which worked with this intermediate form to improve debugging, introspection, and better code generation.

We could even inject new code to add features (fall back to attribute access; prevent the fallback to attribute access) to code, even within lexical scopes. That is to say, we could manipulate how libraries behave from the outside in, and ensure that our changes would not leak out from our desired scopes.

It's certainly possible to replace the Perl 5 opcodes yourself, if you're comfortable reading Perl 5 source code, writing XS, relying on black magic, and dealing with strange issues of thread safety and manipulating global or at least interpreter-global values in a lexical fashion (while dealing with the fact that use is recursive in a sense)—but isn't Perl about not making people write C to do interesting things?

Certainly this isn't a technique you'd use every day, and it's not obviously a way to make Perl 5 run faster (though many optimizations become much easier), but the possibility for better abstraction and extension and correctness has much to recommend it.

And, yes, Lisp demonstrated this idea ages ago.

Find recent content on the main index or look in the archives to find all content.

Modern Perl: The Book


The best Perl Programmers read Modern Perl: The Book!

Recent Comments

  • rcaputo.myopenid.com: Now that I've actually read what you wrote earlier about read more
  • rcaputo.myopenid.com: Filter::Template on CPAN allows you to do something like this: read more
  • szabgab.com: tobyink, Leon, that sounds nice. I am not sure it read more
  • Leon Timmermans: A solution for that is on my todo list for read more
  • http://openid.tobyinkster.co.uk/tobyink: App::Reprove is my solution for testing an already-installed module. You read more
  • perlpilot.myopenid.com: Not running tests on installation has the nasty side effect read more
  • chromatic: How does Java being a world of suck make it read more
  • http://openid.aliz.es/sysop: I don't understand. How does Java being a world of read more
  • jonswar.myopenid.com: You can also pass "--notest" to both perlbrew and cpanm read more
  • autarch.urth.org: You can also pass "-j N" to perlbrew directly when read more

Recent Assets

  • butteraptor.png

Categories

Pages

OpenID accepted here Learn more about OpenID
Powered by Movable Type 4.23-en