November 2011 Archives

Temporary Directory Handling in Tests

| 2 Comments

Per my experiments with parallelism in Perl test suites, I've adopted several patterns. One such pattern allows me to manipulate the filesystem in a parallel-friendly way.

A lot of my code handles batch processing: fetch some data, sort it into various logical buckets, manipulate the contents of each bucket, then produce some sort of output for everything that made it that far. While most of these steps only manipulate active data in the queue, some steps require me to read and write files in the filesystem—see Pod::PseudoPod::Book (soon to be on the CPAN) for example.

As with any parallelism, multiple units of execution contending over the same single shared resource is an exercise in conflict, or at least complicated locking.

For simple needs, File::Tempdir is great. You can create a temporary directory with a lifespan tied to the object representing it. When that object gets destroyed, its destructor removes the temporary directory.

I needed something more. I wrote the very silly, very simple Tempdir solely for one project's test suite:

package Tempdir;

use Cwd;
use autodie;
use File::Temp;
use File::Path;

use base 'File::Temp::Dir';

sub new
{
    my ($class, %options)       = @_;
    my $self                    = File::Temp->newdir;
    @{ $self }{ keys %options } = values %options;
    $self->{original_dir}       = cwd();

    chdir $self->dirname;
    File::Path::make_path( @{ $self->{mkdirs} } );

    bless $self, $class;
}

sub write_file
{
    my ($self, $name, $contents) = @_;
    open my $outfh, '>', $name;
    print {$outfh} $contents;
}

sub DESTROY
{
    my $self = shift;
    chdir delete $self->{original_dir};
    $self->SUPER::DESTROY( @_ );
}

1;

Like File::Tempdir, creating a new Tempdir object creates a temporary directory. In addition, it saves the current working directory and chdirs to the new temporary directory. Because I'm careful to use only relative paths within my code (business requirement: prefer running multiple related instances of a project on a single machine to separate virtual machines), as long as the relative necessary files and directories are present, everything continues to work correctly. (Also because this temporary directory manipulation happens at runtime, the test file's connection to the work queue is already in place, so chdir works just fine.)

If you provide the constructor a mkdirs key with an array reference as its values, the object will create, relative to the temporary directory, additional subdirectories of arbitrary depth. I also added a very simple convenience feature to write a file. I haven't needed more than this yet:

# create the storage directories for topics/2
my $tempdir = Tempdir->new(mkdirs => [ 'sites/Bravo', 'sites/Bravo/css' ]);

...

$tempdir->write_file( $css->filepath, $css->contents );

When $tempdir goes out of scope, all of these files and directories go away. Even if I were to run a hundred instances of the same test file simultaneously, they would all run successfully because they do not interfere with each other.

Though I chose an OO interface for this behavior, I prefer a higher-order interface in some ways. I'd like to be able to write:

within_tempdir mkdirs => [qw( some list of directories )]
{
    # do something
    ...
};

... but I haven't convinced myself quite yet that it's an improvement. Certainly it has the potential to be more correct, as nested lexical scoping has a better chance of applying and unapplying chdir calls in the correct order (it behaves more like properly associated pushd/popd calls in bash), but Perl 5's limited abilities for parameterization of these thunks is clunkier than it ought to be. I could experiment with an interface where you specify parameters to import which produces and exports a partially applied function, but the OO version is good enough for me for now and continues to stay out of my way.

Solving Problems or Absorbing Design Patterns

At the end of January I decided that Perl 6 is currently impractical for my purposes, but other people have different ideas. In any serious discussion of Perl 6, you'll find people claiming that Perl 6 is already usable.

Discovering the minimum viable utility of a programming language is a difficult exercise. Certainly in 1994 few people would have expected that the obvious successor to Perl 4 would eventually need multi-megabytes of extensions to add a full object system rivaling CLOS plus Smalltalk, an asynchronous event system, a high-powered object-relational mapper with pervasive laziness, and an abstraction mechanism for HTTP? Yet that's the state of the art in Perl in 2011, and we're all the better for it.

If you read carefully between the lines of my criticism of the current state of every Perl 6 implementation I've used as well as what the people who claim that "Perl 6 is usable right now" write, you may find that we approach the problems we're trying to solve from two different angles. We even agree on one of those vectors.

When someone like Moritz Lenz writes "You can use Rakudo or Niecza for real programs now!", I think it's fair to rephrase that as "Any of the leading Perl 6 implementations implements enough structure of a complete and useful programming language that you can implement your own code."

I can agree with this sentiment. I even go a step further; I don't mean to water down his assertion or to put words in his mouth. If you analyze Perl 6 right now in terms of the Perl 6 RFCs, it's clear that Perl 6 even as currently implemented has advantages over Perl 5 when you compare features on a matrix.

(I've since come to believe that you're dooming yourself to potential technical excellence and popular obscurity by focusing on features instead of Getting Stuff Done.)

To rephrase, Perl 6 may certainly be a better programming language than Perl 5 in that Perl 6 includes features Perl 5 should have had years ago. If design patterns are signposts of missing language features, Perl 6 is the Christopher Alexander of programming.

If that's sufficient for your purposes, great. (You still have to deal with performance issues and regressions, and licensing concerns with the Mono dependency of Niecza (Niecza fans, be honest. How is my business supposed to pay Novell anymore for indemnification per their patent licensing arrangement with Microsoft—you remember, the Microsoft that sued TomTom for patent infringement?), and the schism between Parrot and Rakudo on the Rakudo side, as well as the history of Rakudo undergoing lengthy and repeated and compatibility-busting rewrites of core components, but after eleven and a half years, you ought to know you're an early adopter by now. (Rakudo fans, be honest. How long after nom became the main development branch did it take to get all of the Panda modules passing tests again? Oh, they're still not? QED.)

The other vector from which to approach problem solving in Perl says something like "You know, if I'm already leaning heavily on the CPAN to download Plack and an IMAP server and an OAuth implementation and AnyEvent and several testing distributions, Moose really isn't that big a deal anymore." In other words, maybe it is rather silly that in 2011 Perl 5 still doesn't have much of an object system in the core and that it could really use real function signatures, but the Perl community is awfully good at making the important stuff on the CPAN just work, and Perl 5 without the CPAN is pretty anemic for solving big important problems anyway, so grab perlbrew and fire up cpanm and after you've finished your cup of tea, you're all set anyway.

The Python community has dealt with the same schism between Python 2.x and Python 3.x, where Python 3 is arguably a better language, but only superficial polyglot magpies and Usenet personas care solely and forever about language qua language, and everyone else eventually needs to use a library or a framework or a tool written by someone else.

I thought I wanted a better language to solve my problems, but as it turns out, right now I can't have both, and I want to solve my problems more than I want a better language. A good enough language which lets me solve my problems is better than a great language in which I'm practically unproductive.

This is one of the few times when I agree both with the backwards compatibility people and Python programmers. Mark your calendar.

Maintenance Costs of a Shared Resource

| 1 Comment

Suppose the only thing Perl ever really needed were a method keyword. If it were implemented correctly—with the associated tests and documentation and a discussion period to see how it fits in with existing code—the cost of adding this feature would be relatively small: it's not much code, it's a simple feature, and it has very few possibilities to interfere with existing code.

That's not the only thing Perl needs, for some definition of need.

(I believe Perl 5 lacks a unified and coherent development vision, which makes this entire discussion both more interesting and less useful. Yet even an idealist has to deal with the world as it is sometimes.)

If you accept that Perl needs (or "could use" or "would benefit from" or "really ought to have") a few more features, you have to answer at least two procedural questions in addition to the technical questions of "What is it?", "How should it work?", "How do we build it?", and "How do we know it works correctly?":

  • Who will build it?
  • Who will maintain it?

By way of analogy, consider the case of an appealing albatross in Parrot, specifically the compiler named IMCC. In Parrot's earliest days, the VM only ran bytecode. An assembler written in Perl turned Parrot's assembly source code PASM into bytecode for Parrot to run.

Melvin Smith wrote an experimental intermediate compiler which added some syntactic sugar to PASM and produced Parrot bytecode either as output or directly in memory for Parrot to run. Parrot needed something like this.

It wasn't long before Melvin's IMCC found itself grafted onto the side of Parrot and used as the primary invocation and compilation mechanism. I hope I'm not mischaracterizing Melvin's opinion as including dismay that his code was a proof of concept which made assumptions and took shortcuts and wasn't exactly what he would have submitted for a shipping product. (I don't blame him one bit for writing prototype code—that's exactly what I would have done.)

Melvin stopped contributing to Parrot shortly after that point, but IMCC lives on to this day. (Again, this was prototype code which needed at least another round or two of severe refactoring before it was suitable for the sort of duties Parrot expected. As an example, I found a register use analysis algorithm which looked to have O(n12) complexity. That exponent of 12 is not a typo. Let me type it again: twelve. One dozen. Ten plus two. I managed to get it down to four in most cases.)

The problem is simple: someone (not Melvin!) dumped a big wad of code into the wrong directory in version control and now everyone has to maintain that code.

Perhaps marking this code as experimental would have helped, but other people (including me) added features and built onto it. (Dan's apologia for leaving Parrot mentions that his strategy of checking in messy code and hoping that would lure new people to help clean it up didn't work out as well in practice as he had hoped.)

In terms of Perl, it's important (as many people point out) to consider the question "Who will maintain this code?" That task shouldn't have to fall to the pumpking or a Nick Clark or Dave Mitchell by default, but it does.

The deeper question that Perl needs to answer is "How can p5p make the whole of Perl easier to maintain?"—not just to make it easier to add and support new features but to reduce the maintenance burden of existing features and to attract new contributors to help maintain code.

You can see this for yourself; open up almost any core module and see if you want to do the work to find and fix a bug. If that's not technical friction, I don't know what is.

Parallelism and Test Suites

| 10 Comments

The relentless pursuit of user efficiency exposes drawbacks now and then. I added HARNESS_OPTIONS=j9 to my .bashrc a while ago, and then noticed that my regular CPAN updates (cpan-update -p | cpanm) had a lot more failures than usual.

Test::Harness (and its internals, TAP::Harness) use the environment variable HARNESS_OPTIONS to customize some of its behavior. This is very useful when running Perl tests through make test or ./Build test or any other mechanism where you don't launch the harness directly.

The j flag allows you to request that the harness attempt to run multiple test files in parallel. If you have multiple .t files and multiple cores in your computer, chances are that parallelism will speed up the test run. (I notice that a lot of my tests are IO bound, not CPU bound, so I can run more tests than I have cores.) My use of j9 works well on my four core machine; your numbers will vary based on your workloads and hardware.

Unfortunately, it's easy to write simple tests which just don't work in a parallel world. Consider the TestServer.pm module used to test Test::WWW::Mechanize. (I chose this as an example because Andy's a good sport, and because I've already opened a pull request for it.) This module starts a server for each test to control the responses returned to the Mech object. That's all well and good; it tests network communication in a mostly real way (yes, the loopback interface isn't exactly the same as a remote server, but it's real enough for most testing uses).

The TestServer constructor in 1.38 is:

our $pid;

sub new {
    my $class = shift;

    die 'An instance of TestServer has already been started.' if $pid;

    # XXX This should really be a random port.
    return $class->SUPER::new(13432, @_);
}

You can probably see the problem already from the comment. If multiple .t files use this module (and they do), and if these files each run in separate processes (and they do), then if these files run simultaneously (as they do i a parallel testing environment), only one file will be able to bind to this port and the others will all abort and cause test failures.

In fact, this is what happens.

I submitted a silly little patch which changes the port to:

    return $class->SUPER::new(13432 + $$, @_);

... which should reduce the likelihood of collisions. (For more safety, the code should check that the given port number is available, but then you have to deal with race conditions and so forth, and there's a point at which adding more complexity to your test just isn't worth it. Also, $$ can be greater than 65535, as Pete Krawczyk points out, so there out to be a sane modulus in there.)

The principle is this:

Manipulating external state in a test file reduces the possible parallelism of your test suite.

You can see the same thing when you write to hard-coded directories in certain tests. (Use File::Temp to create temporary directories—which can clean themselves up!). You can also see the problem when you use a single database for testing (use something like DBICx::TestDatabase to create and populate a database in memory).

Anti-parallelism bugs in test suites are unnecessary and in most cases are easy to fix, once you know what to look for. As the CPAN continues to grow and as our applications rely on more and more great dependencies, the mechanisms we use to manage our code become ever more important. It's easy to avoid these problems—and it's even easier to understand why parallel testing is valuable when you can cut your test run wallclock time in half.

Maybe the world needs a book called "Business Thinking for Know-it-All Techies". Certainly the Perl world does.

After a very pleasant conversation with the criminally underrated VM Brasseur at this year's Open Source Bridge Conference, I decided to improve my copywriting skills. Certainly I've been accused of perpetuating content-free, slick marketing on the Perl world before, so why not undercut that libel a little bit by figuring out how to communicate better with real users and real customers?

This lead me to several books, chief among them the worth-ten-times-its-price The Copywriters Handbook.

The high point of the book so far—a technique I've used on three projects in the past week to great results—is to distinguish between technical features and customer benefits.

In other words, while experienced Perl masters might say "Perl 5 is great because you have access to the CPAN", that's a feature. The benefit is that "80% of most programs has already been written". While it's technically true that the Modern Perl book covers primarily Perl 5.12 and 5.14, the benefit is that "the book demonstrates the current ways to write great code". While the as-yet unlaunched value analysis site a couple of us are launching for small investors has the technical feature "updates analysis after market close every day", the benefit to customers is "gives you the best advice possible whenever you check it".

Listing features in terms of features (for the sake of their own obvious technical good, of course) and not expressing them in terms of what they mean for users is where a lot of my previous evangelism for Perl 6 went wrong, of course. It's easy to list off multiple dispatch and continuation passing and gradual typing as if you were competing for a Fields medal in multiparadigm integration, but when I code, mostly I just want the code to work.

On the other side, I suspect that a lot of the reason that experienced Perl 5 programmers don't really care when non-Perlers sneer "Does it have function signatures yet?" is that (even though it'd be grand not to have to write my ($blah, $blahblah, $blahblahblah) = @_; all the time) it doesn't really matter. It's a minor inconvenience compared to the ability to get stuff done.

Even so, it's important to acknowledge which parts of Perl get the benefits-not-features idea right. For example, consider the module POD skeleton format, where a one-line NAME explains what the module does, then a SYNOPSIS shows an example of working code, and this all happens before a description and detailed technical walkthrough of how to use it. When this format works, it works well.

My experience so far has been that the exercise of comparing features to benefits takes some time, but yields great results. Try it yourself; it's easy. Grab a piece of paper and make two lists. On the left side, write all of the distinct technical features you consider worth mentioning. When you finish, write on the right side a benefit from the customer or user point of view corresponding to that technical feature. Sometimes there's overlap, and that's okay.

When you finish, you should be able to do three things. First, you should be able to identify distinct themes in your benefits. Sometimes these will surprise you, and that's good. Second, you should be able to rank the benefits for each particular audience by priority. If you have multiple audiences, so much the better—you've unlocked an extra-credit achievement. Finally, you should be able to write better prose explaining why your product or project matters to each audience by arranging the benefits in a logical order. (Your skill as a writer matters a lot here, but even if you're still learning how to write effective persuasive prose, the act of thinking in terms of customer benefit is already a huge improvement.)

Now imagine if the Perl world could practice and polish this skill in our technical communications. Sure, it's marketing and obviously evil, but don't we pride ourselves on helping people do the things they want to do?

PHP has Zend. Python has Daddy Googlebucks. Java has... let's call it the 1% of programming languages. C# has Microsoft. JavaScript has everybody who's ever written a web browser (except possibly the W3C).

I haven't used Windows for work purposes in ages, but I'm glad ActiveState exists—Perl 5 is much better on Windows for their assistance.

I'm also glad to see Perl Foundation sponsors, especially the ones who've contributed to the Fixing Perl 5 Core Bugs Grant.

Yet I wonder why it's so amazingly difficult to get full-time funding for new Perl 5 core development. (Again, this is not to complain about funding for Nick Clark or Dave Mitchell, because their work is important and valuable and everyone who's contributed deserves gratitude.) I can't decide what I think, but I have some possibilities:

  • Perhaps Perl is just too effective. It's a tool you apply when you need to do something quick or dirty or now, and after that it just works and you forget about it. (There's a lot of code like this in the world. Then again, a lot of small businesses still run on Access and other 4GLs.)
  • Perhaps the dominant perception of Perl 5 is a life support system for the CPAN. (Much of the evolution of Perl happens on the CPAN to be sure, and so volunteer effort goes there.)
  • Certainly it's too difficult to hack on the Perl 5 core, so the available contributor pool is not in what anyone might legitimately call a state of growth.
  • Perhaps new development in Perl 5 tends to be plumbing tasks and not big, bet-the-company technical choices. (Even though my business relies completely on Perl 5 for its technology stacks, I've long realized that I'm not representative of the world as a whole.)
  • Perhaps everyone assumes someone else will take care of it. (Larry hasn't had a patron since 2000.)
  • Perhaps Perl 5 has reached minimum viable utility as it is and needs no more new features. (I have trouble believing this. I could certainly use better parallelism. I've long wanted either or both of hygenic macros and continuations. I could use multiple dispatch today. Grammars would be wonderful. Opaque objects would be grand. Who doesn't want a JIT or more speed or lower memory usage?)
  • Perhaps backwards compatibility is cannibalizing usage from current (and future) releases. (Red Hat and CentOS, please feel free to join Perl 5 in the 21st century. We're pretty sure this century will stick by now.)
  • Perhaps TPF isn't effective at courting donors. (I hesitate to bring this up, because it sounds like criticism, but that's not the intent. I do believe TPF has successfully courted some donors, which is the reason why Nick and Dave have funding right now, and I have no desire to criticize the work or abilities or interests of volunteers doing things I have no desire to do, but it does seem fair to say that I haven't seen much effort in the past five years to talk to large Perl shops to express a coherent vision for core development.)
  • Perhaps (and I find this most likely) there's no coherent vision for Perl 5 core development. Jesse Vincent's Perl 5.16 and Beyond (video link) lays out a good and effective and necessary philosophy for how to manage changes, but is there really anything to get excited about? strict and warnings by default?
  • Perl 6 didn't save Perl. (Yes, yes, Milestones in the Perl Renaissance, but "Someday this will be amazing!" becomes less amazing the longer someday takes.)

It's no one thing and it's probably a combination of several things to various degrees. I hope that someone like the amazing Mike Pall of LuaJIT might come along and demonstrate a powerful proof of concept of something exciting (better parallelism/concurrency, macros, an improved parser, a working JIT, a no-XS extension mechanism, an 80% port of the sanest parts of Perl 5 to a different virtual machine, whatever). Maybe that's Reini Urban, and maybe it's someone we don't know about yet.

It's difficult to imagine someone new jumping in to the big wad of heavily-macroed C code that's the current Perl 5 implementation and having all of the time, interest, and energy to learn what's going on as well as the luck, skill, and patience to make substantive changes without horribly breaking a dozen things elsewhere while successfully convincing p5p that the changes are worthwhile and maintainable and won't be the subject of massive imprecations and furrowed brows in two years.

Maybe I'm just a pessimist today though.

What do you think?

Update: PDFs, ePub, and Mobi files are available from Modern Perl: The Book, and you can read Modern Perl: The Book online now!

We still welcome reviewers for the Modern Perl book 2011-2012 edition draft. We'll close down the review and take down the draft this Friday, 18 November 2011. Thanks to everyone who's commented so far and to everyone who will comment.

Expect to see a new printed book available for sale online by the end of November or the start of December. A couple of weeks after that, we'll make electronic versions of the new edition available in PDF and ePub formats, and then we'll put the entire text of the book on this site in standard XHTML. We'll include all of the anchors used to make the index work, so you can link to specific sections of the book. We'll also include links to translations in other languages.

Again, the best way to report a bug or typo or make a suggestion is through the Modern Perl book Github repository. Pull requests are very welcome (though the smaller, the better).

Thanks to Jeff Thalhammer for this guest post. Jeff has specialized in Perl software development for over 10 years. He is the senior engineer and chief janitor at Imaginative Software Systems, a small software consultancy based in San Francisco. Jeff is also the creator of Perl-Critic, the leading static analysis tool for Perl.

Andy Lester introduced me to the concept of technical debt several years ago, and I immediately fell in love with the idea. At that time I was working in the financial industry, so the debt metaphor was useful for explaining technology issues in terms that our business stakeholders could easily understand.

I wanted to take the technical debt metaphor even further. For several years, I tried to find more analogies between debt instruments and technology. I wanted to model software development in terms of credit risk, term structure, secured versus unsecured debt, and so on. We had powerful tools for pricing financial debt so surely we could do the same for technical debt. We just needed to translate the concepts from one domain to the other. Or so I hoped.

But the more I thought about these things, the more I realized how flawed the technical debt metaphor really is. The biggest problem is repayment. Financial debt is based on the presumption of repayment. Yes, there is a risk of default and that gets priced into the interest rate. But every creditor assumes that the debtor at least intends to pay them back. With technical debt, this just isn't true. You may incur some technical debt by tightly coupling classes, or omitting test cases, or copy-and-pasting code. But if that code never needs to be changed or the project is canceled unexpectedly, then you never have to repay that debt.

And this is exactly why our software projects incur "debt" in the first place—because we are hoping that we won't have to pay it back. Sometimes we may be right, and often times we are wrong. So when I think about it that way, software development starts to look more like betting instead of borrowing. And that leads me to a new (and hopefully better) metaphor: technical insurance.

Consider car insurance. You pay a small premium now in exchange avoiding a larger payment later, if (and only if) you wreck your car. If you don't buy the insurance, then you are stuck with the full cost of repairing or replacing your vehicle, but only if you wreck.

Technical insurance works exactly the same way. For example, you can spend a little effort to write automated tests now in exchange for not crashing your production system later if a certain bug is introduced. And if you choose not to write the tests, then you lose all the revenue during the downtime, but only if that bug actually does get introduced.

Generally speaking, the price of insurance is based on the magnitude and probability of loss. If you drive an expensive car and have had several accidents, your insurance premiums will be higher than average. Likewise, if your software project is mission critical and has a history of failures, then your technical insurance will cost more. Younger, inexperienced drivers tend to have more accidents so their car insurance costs more. Similarly, a team of junior developers tend to make more mistakes and their technical insurance will cost more.

So how do we pay our technical insurance premiums? By writing tests, pair programming, doing code reviews, refactoring, or any other practice that improves code quality! Bear in mind that not all premiums have the same value—an hour spent in a code review may provide you with more insurance coverage than an hour spent writing documentation. Nor do all your premiums go toward the same policy—automated testing covers you against a different set of calamities than pair programming.

Given the premiums for your team, you then have to decide whether or not to buy a technical insurance policy. This depends on your team's level of risk aversion. If your team is willing to risk complete failure and get everyone fired, then you don't need any technical insurance. On the other hand, if the project absolutely must succeed (for some definition of success), then you need lots of technical insurance. Most teams will fall somewhere in the middle, but each will be different. You don't want too much or too little insurance—you want just enough to complement your level of risk aversion.

The technical debt metaphor still has value, however. The notion of interest—especially compounding interest—really helps people understand that putting things off can lead to bigger and bigger costs over time. A similar example in the insurance world might be medical insurance. It is a lot cheaper to buy medical insurance when you are young and healthy. But as you get older and develop "pre-existing" conditions, the cost of insurance skyrockets. The same is true for software. A new project is cheap to insure, and if it stays healthy then the premiums stay low. But if problems start to accumulate and fester, it gets more and more expensive to insure the project.

At this point, I've come to believe that software developers and managers have a lot more to learn from insurance actuaries than they do from bond traders. For me, the technical debt metaphor was inspiring, but I think that a technical insurance metaphor could be both more accurate and more useful. Smart development teams measure their unique risk exposures, know their own level of risk aversion, and carry the right types and levels of technical insurance for their specific needs.

Once in a while, someone asks "How can I compile my Perl program to a binary?" Once in a while, someone answers "Use B::CC, at which point many someones shudder and reply "No, please never suggest such a thing, you horrible person."

Set aside that thought for a second.

You may have heard of Devel::Declare, which allows you to bend, fold, spindle, and mangle Perl syntax in a way that's safer than source filters but which allows nicer code such as signatures to work without making some poor fool like me patch the Perl parser. Unfortunately, D::D works by hijacking parts of the parsing phase to inject bits and pieces of alternate Perl code in place of non-Perl code.

The good news is that it's fairly well encapsulated and respects lexical scope. The bad news is that you're using Perl to generate Perl, which has many of the same drawbacks as when you use eval. (The good news is that you don't have to parse all of Perl. Make that great news.)

What you can't do easily is manipulate code that's already been parsed or compiled. Sure, you can manipulate the symbol table and examine things, if you know the relationships between and representations of Perl's internal data structures, but you're at the mercy of binary representations written in C, which can vary between major releases.

The B:: family of modules are not the answer because they exist at the wrong level of representation. It's not their fault—they do the best they can with what they can access—but they're doomed to hacks and workarounds and incompletenesses because of other incorrect decisions.

I've released Pod::PseudoPod::DOM on Github (it needs documentation and more work on XHTML output before it's ready for the CPAN) as part of my work on two Onyx Neon books, Liftoff and the upcoming second edition of Modern Perl: the book. I've written about the reasons why I revised the internals of the PseudoPod parser so heavily (everything is a compiler).

The same reasoning applies to the Perl parsing and compilation process.

If Perl had an intermediate layer between lexing/parsing and producing the optrees which the runtime uses to execute code, and if that intermediate form were a sufficient representation of a program, and if that intermediate form were accessible from C as well as Perl itself, we could solve a lot of problems.

(I've used B::Generate productively. It's difficult to do so. You get to dodge segfaults. You have to become an expert on the internals of the versions of perl you want to use. Note the plurals. Whee.)

In particular, a good macro system (one which is not "Run these substitutions over that code") would be possible. It might also be possible to translate certain classes of Perl code to other languages with substantially more ease, or to identify error patterns, or to perform better syntax highlighting, or to canonicalize the formatting and idioms of code in one fell swoop.

(You still have to deal with XS modules and the BEGIN problem, but you can embrace some ambiguity in the grammar and the abstract representation and still produce a valid and parsed representation even if you have to coalesce two alternatives into a single representation with out of band knowledge. It's not impossible to get 90% of all programs represented perfectly, and another 5% shouldn't be too much more work.)

Unfortunately, a proof of concept would likely take a good hacker a month of work. A solid demo is likely six months of work. The entire project probably represents two years of work.

It's still a pleasant daydream though.

Some of my projects have many, many tests. While we keep the entire suite runtime under 30 seconds (hopefully under 10 seconds for a parallel run), Devel::Cover imposes a measurable performance penalty. I appreciate using Dist::Zilla to manage our distributions, and the additional command to make dzil cover work is very handy, but running the entire test suite through D::C more than once seems a little silly.

If I'm improving the test coverage of a piece of code, I usually care only about that piece of code and nothing else. Thus I wrote a tiny little bash function to encapsulate the appropriate invocation to measure the code coverage of only the code I care about:

function cover_test
{
    rm -rf cover_db/;
    PERL5OPT=-MDevel::Cover env perl -Ilib $* 2> /dev/null;
    cover
}

Run it and provide one or more test files. D::C will produce a coverage report in your console and as an HTML file. (One nice feature of removing the cover_db/ directory is that the report will always be available in cover_db/coverage.html, so you can refresh your browser window.)

I added the standard error redirection to avoid the error messages the current version of D::C (0.79) emits when analyzing Moose code. That's likely to go away with a new release.

This shell function only saves me thirty seconds for each invocation, but that allows me to run coverage every five minutes—or more frequently—to verify my progress. It's improved the way I work.

On Technical Friction

Ovid suggests that technical debt is a misleading metaphor. He's right in that, like any metaphor, the surface comparisons are more similar than the deep correspondences. Yet it's a useful metaphor even at the surface level.

(I haven't asked Ward Cunningham about this directly, but I've heard secondhand that he originally spoke of technical debt as the compromises of the business requirements of software. In other words, every time you guess as to what the software should do instead of verifying what it should do, you take on technical debt. It's not about quality in this model, it's about correctness.)

The idea of debt and technical shortcuts is useful if you consider that debts in the real world often have interest payments and can reduce your cashflow and at some point require your creditors to come calling around to liquidate your assets. It's useful in an advanced way when you successfully leverage short term debt for long term gains. It's not useful when you consider that managing debt in a real and successful business (or even on your own) benefits from an understanding of leverage, inflation, repayment schedules and amortization, taxation strategies, goodwill, depreciation, and securitization.

(Then again, software people are bad about metaphors. Electronic readers aren't books and don't require you to flip pages. Also the construction industry works nothing like most programmers and project managers seem to think. Construction is more like software than software is like idealized construction.)

I tend to think of this thing-previously-known-as-technical-debt as friction. The weight bench in the closet to the right of my desk in my office may have a lot of friction, but it rarely matters because I haven't moved it in three years and I'm not likely to move it any time soon. (If I do move it, my weekend is ruined already, so I don't care if it glides across the floor like some sort of elegant ice dancer.)

I do care if something gets jammed in the casters of my chair, because I move around every couple of minutes and only notice when I can't move.

The more difficult it is to make a necessary change to a piece of code you need to change—the more difficult it is to continue to make changes—the more technical friction you have. If you don't need to make changes, or if you need to make one quick, small change and that's it, a lot of friction doesn't matter too much. If you're constantly making changes, a little bit of friction matters a lot.

I do still like the metaphor of debt for two reasons. It suggests that technical practices, even in moderation, can pay down the debt of a piece of code. That's good. That gives people hope. It also uses the language of finance and money to express the value of a piece of code. (Code is a means to an end. Remember that.) The notion of rubbing butter on a stubborn API (or pouring extra virgin olive oil on an API by removing deprecated parts of it) fails to inspire me in the same way.

I use the language of technical debt, knowing that the metaphor is incomplete and flawed, but I think about identifying and removing friction—not just to start moving but to continue moving.

A Gentle Reminder about Test Coverage

| 7 Comments

Devel::Cover is wonderful, and so is Dist::Zilla::App::Command::cover. Measuring your test coverage can help you increase your confidence in the correctness and reliability of your software.

The numbers in the coverage report are only tools, however. Your job as an intelligent and capable human being is to interpret those numbers and to understand what they mean.

For example, my new book formatting software (which needs more documentation before I release it publicly) has a handful of hard-coded escape codes for the LaTeX emitter to produce the right code. Part of that code is:

my %characters = (
    acute    => sub { qq|\\'| . shift },
    grave    => sub { qq|\\`| . shift },
    uml      => sub { qq|\\"| . shift },
    cedilla  => sub { '\c' },              # ccedilla
    opy      => sub { '\copyright' },      # copy
    dash     => sub { '---' },             # mdash
    lusmn    => sub { '\pm' },             # plusmn
    mp       => sub { '\&' },              # amp
    rademark => sub { '\texttrademark' }
);

sub emit_character
{
    my $self    = shift;

    my $content = eval { $self->emit_kids( @_ ) };
    return unless defined $content;

    if (my ($char, $class) = $content =~ /(\w)(\w+)/)
    {
        return $characters{$class}->($char) if exists $characters{$class};
    }

    return Pod::Escapes::e2char( $content );
}

While emit_character() is interesting on its own and worthy of testing, the important code is the %characters data structure. Devel::Cover can't tell me if every entry in that hash gets accessed appropriately (though I suppose it could in theory track the use of the anonymous functions). Only my knowledge of the tests and the code can satisfy me that I've tested this important code thoroughly.

Again, Devel::Cover is a great and useful tool, and acknowledging limitations like this in no way diminishes its efficacy. (It's particularly informative about branch and condition coverage.) Yet as with most tools, it exists to enhance human knowledge and judgment, not to replace them.

Perhaps you saw a study of the accuracy of non-programmers cargo culting the syntax of programs to write new programs. (For a good laugh, read an apologia for methodology problems in the study.)

Here's the problem with most attempts to teach non-programmers to program: programming is a creative process of solving problems, not an exercise in arranging atoms in the proper order.

It's as if we expected teach people to write great stories by giving them exhaustive grammar lessons such that they understand tense, mood (in the imperative, subjunctive, et cetera sense), and declension while neglecting study of plot, character, and theme.

Certainly we can and should discuss ways to make programming easier, to improve learning, and to prevent mistakes as far as possible, but suggesting that the primary mechanism of learning (by measuring it in apparent isolation) is mimicking syntax while ignoring the practical matter that solving a problem is the important part, while writing the program is a matter of implementation, misses the point.

A better study would have measured whether participants could describe their solution to the problem in a correct way regardless of syntax, or at least controlled away that variable.

I suspect the focus on syntax comes from the stranglehold of mathematics and computer science on programming. (In some ways, we haven't left the notation wars of Newton and Leibniz.) The irony to me is that mathematics is a world of creative problem solving.

While you can measure true differences between programming languages (at least along their theoretical axes), learning how to program at all is a skill that seems to be independent of notation. Teaching that is hard.

(I learned how to program on home computers in the early '80s, and I switched back and forth between several variants of BASIC and Logo a few times. Expressing my intention was more difficult than the particular abomination of BASIC I used. I suspect my experience is not so rare.)

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide

Categories

Pages

About this Archive

This page is an archive of entries from November 2011 listed from newest to oldest.

October 2011 is the previous archive.

December 2011 is the next archive.

Find recent content on the main index or look in the archives to find all content.


Powered by the Perl programming language

what is programming?