From Novice to Adept: Embracing Idioms

If you ever want to annoy a native speaker of a language you're learning, translate idioms from your primary language literally into the other language. For example, if you're learning Spanish and you come across something surprising or perplexing, yell out "¡Santa vaca!" as loudly as you can within earshot of native Spanish speakers.

The best response I usually get is "That... doesn't mean what you think it means."

I smile knowingly.

It's easy to see similar behavior from novice programmers who haven't learned the idioms of their new language, as well. Baby Perl can be clunky due to lack of understanding of language features, but it can also be clunky due to lack of understanding of language idioms.

You can see this in iterating over arrays:

# Baby Perl

for (my $i = 0; $i < scalar @items; ++$i)
{
    my $item = $items[$i];
    ...
}

... which uses the C-style for loop, introducing a temporary variable unrelated to the problem, and often succumbs to fencepost errors. The Perl 5 idiom is simple iteration with a foreach loop (spelled for because it's shorter and equivalent, keyword-wise):

for my $item (@items)
{
    ...
}

(Amusingly, when I wrote the C-style loop, I left the sigil off of $i. Although you occasionally need this type of loop in Perl 5, it's rare. I write it often enough in C that my fingers write C and never Perl here.)

Another Perl 5 idiom causes a semi-frequent misunderstanding. Here's the idiom for reading from a file through iteration, assuming you have a lexical filehandle stored in $fh:

while (<$fh>)
{
    chomp;
    ...
}

It's common to see Baby Perl which reads all of the lines of a file into an array and then iterates through that array with a C-style loop:

# Baby Perl
my @lines = <$fh>;

for (my $i = 0; $i < scalar @lines; ++i)
{
    ...
}

One problem here is the missing chomp to strip off the input record separator from each line; that bites a lot of people. Another problem is that Perl 5 isn't lazy enough to read lines only as you access them in @lines; it populates the entire array in memory. The while version is much more parsimonious. That's subtle, but it surprises people.

The biggest surprise comes from people who consider <$fh> to refer to the filehandle, when it's mere syntax for an operator named readline(). This leads to code that appears to skip every other line in a file:

# probably buggy code

while (<$fh>)
{
    my $line = <$fh>;
    ...
}

There are plenty of other interesting idioms, such as list and hash slices, true-or and defined-or assignment, the temporary empty list operator, and more. The best way to learn these idioms is to read good code. I spent a lot of time browsing PerlMonks and trying to answer questions (even if I didn't post answers). Another good approach is to browse perldoc perlfaq.

Mostly, I believe there's no substitute for experience and code review from experienced developers. It may be painful to expose your hard work to a harsh and unforgiving world, but if you can get past your ego and take the hard-won advice of others to heart, you'll always learn something interesting.

That doesn't excuse harsh, demeaning, and abusive behavior, of course -- but that's a different problem. Your local Perl Mongers group or mailing list can be a friendlier place to start than many Perl IRC channels or newsgroups. The Perl Beginners mailing list is a great resource as well. Start there.

2 Comments

hercynium.myopenid.com | October 20, 2009 6:41 PM

The only thing I didn't recognize by name is the "temporary empty list operator" ... do you mean this?

my $foo_count =()= @foo;

Of course I'm sure you know its more informal name! ;-)

Shlomi Fish | October 24, 2009 6:24 AM

I agree with you about it. However, idioms are nice and dandy, but naturally, when going on IRC, one can often open the Pandora box of the colour of the bike shed argument. For example should it be:

Person->new(
   first => "Sophie",
   'last' => "Cohen",
   birth_year => 1977,
);

Or:

Person->new({
    first => "Sophie",
    'last' => "Cohen",
    birth_year => 1977,
});

(with a hash ref)

I tend to think the second option is better because it's probably a bit faster (not that it matters a lot), and because it will warn about an "odd number of hash elements" earlier, but there's a lot of code on CPAN out there that uses the first option, because it involves somewhat less syntax. In one of my modules, I supported only the second option, and a programmer who was after my T-shirt offer, implemented convulted code to have it either with a single hash-ref or flattened into @_, which I had to reject.

Naturally, this is just the tip of the iceberg. Brace indentation, placement of HTML opening and closing tags, whether HTML should be nicely indented (even if generated by a server-side language), or instead occupy as little space as possible, whether you should sub-class sub new {... } or sub _init {....} etc. etc. are all a matter of much debate, and they are pretty much a bike shed colour's argument (while they often do have some points for and against).

I think it's important for a beginner (in Perl or in any other language) to distinguish between such minor matters and non-idiomatic code, and that it is even more important for an expert to see past his own preferences for the bike shed's colour when helping newcomers.

Tags:

2 Comments

Modern Perl: The Book

Categories

Monthly Archives

Pages

About this Entry