November 2013 Archives

The strict Pragma is a Cultural Marker

For the first several years of the Perl programming language, there was no strict pragma. Variables were effectively global. Barewords were okay. There was little protection against typos—certainly no compiler support for finding them before they caused bugs in your programs.

People wrote a lot of code in those conditions. Even after Perl 5 came out, because strictness of variable declarations, barewords, and references were all optional, people still wrote a lot of code without strict enabled. A lot of that code worked. (As Mark Jason Dominus points out, Perl's strict pragma is not a magic "fix my code" button.)

With that said, I enable strict in every serious program I write, at least every program that's not a one-liner stuck in a shell alias somewhere. The Modern::Perl pseudo-pragma enables strictures, and I encourage novices learning from anything I've written to use it.

For me, it's about limiting risk. I know I make typos. Having perl check that I haven't misspelled a variable or a function name is a tiny little piece of peace of mind that limits the damage I can do from a silly mistake. (I don't find strict reference checking all that useful, because I use soft references very rarely and in very deliberate circumstances.)

For others, it's about helping them to learn. It's difficult enough to learn the semantics of a programming language when you're learning the syntax and idioms too. Yes, use strict; is a small piece of boilerplate magic that a novice might not understand for a while, but a beginner is more likely to make more frequent and more severe beginner mistakes than an experienced programmer. I need a safety net to help me avoid typos, while a beginner needs a safety net to train him or herself to avoid beginner mistakes as a habit.

Habits are important. They're indicators. They're signals. They can be subtle, but when you see a program which starts:

require 'cgi-lib.pl';

... you probably already have a pretty good idea of what you're going to find. (The program may be great, but it may violate some RFCs about the CGI protocol and it's probably not great at all in fact.)

When you see a CPAN distribution with a t/ directory with only one file, t/00-boilerplate.t that's a few dozen bytes in size, you probably have a pretty good idea of the quality and maturity of the code. (The code may be great, but you're going to have to do a lot of work yourself to prove that.)

When you see someone complain that "PERL is unreadable", you probably have a pretty good idea about the kind of code he is referring to. (That code probably starts with require 'cgi-lib.pl';, or at least would be less buggy if it did).

These aren't markers of your worth as a person. They're not markers of the quality or utility of your code. They make no promises of the presence or absence of bugs of intent, and they're not guarantees that there are no bugs of typos. (Would strict be more useful if it could somehow enforce strictness of hash keys?)

The direct and pragmatic value of strict is that it does offer a minimal level of assistance from perl to avoid common behaviors which are often mistakes. They're not necessarily mistakes in very short code, but when you get beyond a couple of hundred lines of code, they're risky. It's easy to overstate its value, even for pedagogic purposes.

It's more difficult to overstate the value of strict as a signifier. It's a symbol. It's an indicator. You're at least making the attempt to avoid silly little errors. The lack of strict oughtn't imply the opposite, but if you're starting out with Perl and using strict, you're at least trying to let the language help you. That's a start.

Like almost anything else in Perl, this oughtn't be a dogmatic rule. The language exists to help you get things done, not to train you to conform to some language designer's theoretical ideals. (But there are theoretical ideals underpinning the design of the language, and you'll have an easier time solving your problems with the language if you design your programs to Perl's strengths.)

Context and the Comma Operator

In A Tiny Code Quiz, Ovid posted this snippet of code inspired by Ben Tilly:

@ar1 = qw(foo bar);
@ar2 = qw(a b c d);
print scalar (@ar1, @ar2);

The answer jumped out at me, but that may be because I spent a lot of time documenting how context works in Perl. The book argues that there are two distinct contexts in Perl. Amount context governs how many items an expression expects (zero, one, or many) and type context governs the nature of data an expression expects (a true or false value, a number, unstructured data).

I know that it sounds like adding a post hoc formalism to a language deliberately designed to break the shackles of rigid mathematical thinking on programming language design and usability, but my thesis is that explaining how Perl works in terms of these two contexts helps people write better code because they're prepared to read and comprehend the documentation.

If you browse PerlMonks or, back when Usenet existed, comp.lang.perl.misc, you've probably heard the phrase "no such thing as a list in scalar context". That's one of the worst explanations to offer to a Perl novice, because it's simultaneously accurate and incomprehensible. What we should say instead is "the comma is an operator".

What's a Perl Operator?

In Perl, an operator is an action. It's a verb, if you're linguistically adept. It's not a variable, because it doesn't have data attached. It's not a value, because it doesn't change. (It's not a function or a method, because Perl's parser knows about it. It's a special kind of verb, a verb that Perl programs are born knowing. Take that, Sapir-Whorf and your thousand words for snowclones!) This seems counterintuitive, like arguing that the semicolon which separates expressions is an operator (even though good Haskell tutorials occasionally suggest that if they have really good explanations of monads). It seems wrong, but it's true.

Every Perl operator has several characteristics. These include arity (how many pieces of data it expects), precedence (when Perl should evaluate it with regard to other operators in an expression), associativity (whether it evaluates its data leftmost or rightmost), and its fixity (where it appears in an expression with regard to its data). All of those characteristics apply to the comma operator. When you write:

my ($rank, $rate) = ( 'Senior Consultant', '1000 septim per hour' );

... you expect the code to have the same effect as:

my $rank= 'Senior Consultant';
my $rate = '1000 septim per hour';

Even if you never thought about the comma as an operator, you probably expect that it separates expressions, appears between two operands, and operates left to right. If you've read Perl's precedence rules, you probably also expect that other operators get evaluated first. (That's why the parentheses are necessary.)

You may have never thought bout the context of the comma operator.

The Context of the Comma Operator

Some operators enforce context on their operands. scalar is a good example. You can't say what @array will evaluate to without knowing the surrounding context. When used as the first operand to push, Perl treats the array as a container and adds elements to it. When used as an argument to a (prototypeless) function, Perl extracts the elements of the array into a list. When evaluated in scalar context, you get the number of elements of the array.

(If you don't know much Perl, this may sound really complicated, but you have to learn a few things about fixity and arity and precedence in any language that isn't completely consistent about its representation form. Even Common Lisp has special rules for quoting that complicate things. If you want a programming language that people who don't know can read and instantly understand everything about what it's doing and why, you're going to be very disappointed when you turn off Star Trek and go outside.)

Other operators pass through their contexts. These operators include return and the comma operator. In other words, when Perl evaluates the operands of the comma operator, it does so with respect to the context in which it's evaluating the expression containing the comma operator.

In other words, in the expression:

my ($rank, $rate) = ( 'Senior Consultant', '1000 septim per hour' );

... the comma operator here is evaluated in list context because the lvalue of the assignment operator imposes list context (assigning to two variables), so the comma operator produces a list of two literal strings.

In the expression:

my $rate = ( 'Senior Consultant', '1000 septim per hour' );

... $rate gets the second literal, because the comma operator evalutes all of its operands and, in scalar context, evaluates to its right operand.

In the expression:

my @odds  = ( 1, 3, 5, 7,  9 );
my @evens = ( 2, 4, 6, 8, 10 );

my ($odd, $even) = ( @odds, @evens );

... $odd contains 1 and $even contains 3, because the comma operator evaluates in the list context imposed by the assignment to an lvalue list. The comma operator flattens its two operands into a list and the first two elements of that list—the first two elements of @odds—get assigned to the two lvalue variables.

In the expression:

my @odds  = ( 1, 3, 5, 7,  9     );
my @evens = ( 2, 4, 6, 8, 10, 12 );

my ($odd, $even) = scalar ( @odds, @evens );

... $odd contains 6 and $even remains undefined, because the comma operator evaluates its operands in the scalar context imposed by the scalar operator. In scalar context, of course, the two arrays each evaluate to the number of elements they contain.

Keep in mind, however, that the comma operator also has a behavior in scalar context. In scalar context, it evaluates to a single value, that of its right operand. The @evens array contains six elements, so that's what the comma expression evaluates to.

Understanding context helps—but the real problem is that most of us never think of the comma operator is an operator. Very few Perl tutorials and examples go into detail about how the parts of Perl fit together with regard to Perl's design philosophy of context, and very few really talk about what operators really are. When understand these concepts, the code example should make a lot of sense to you (even if you'd never write such a thing).

To use Unicode effectively, you have to learn a lot more than the difference between 7-bit ASCII and UTF-8. For example, did you know that you can represent the same glyphs in multiple ways? That's right; multiple combinations of codepoints can produce the same glyphs.

If you're doing something interesting with user input such as comparing two strings or searching for one string in another, you probably want those strings to use the same canonical representation of codepoints. (You'd hate for users to file bugs that they can't find whatever they're searching for when what they're looking for looks correct, but they typed it a different way than you did.)<.p>

This is why Unicode Normalization Forms and Unicode::Normalize exist.

That's why I just released Mojolicious::Plugin::UnicodeNormalize. When it's active, it silently normalizes all incoming data to a single normalization form (in our case, NFC worked the best). It doesn't mess with uploaded files. It silently does the job, and it imposes only a tiny penalty.

It's been in use in a client application for almost a year and it helped us avoid countless bugs. Now you can use it too.

Would You Miss Autoderef in 5.20?

In Perl 5.13 (the development track for Perl 5.14), the experimental feature called "autoderef" appeared. The short description from perl5140delta reads:

Array and hash container functions accept references

In other words, given a hash reference $hash or an array reference $array, you can write @keys = keys $hash or push $array, $value and Perl will happily interpret $hash as %$hash and $array as @$array.

Again, this feature was marked as experimental. It's not terribly risky, except that autoderef interferes with enhanced each.

Perl 5.19 (the development track for Perl 5.20) has a new experimental feature called "postfix dereferencing". I don't have an opinion worth sharing yet, as I haven't thought about it to my satisfaction. Pumpking Ricardo Signes seems to like the feature a lot, which is a positive vote for it.

Today he sent a public thought to p5p about the combination of these two features, asking "With postfix deref in, is auto-deref still valuable?".

I admit that I haven't used autoderef much. Perl 5.14 is as old a version of Perl as I want to target (and I'd rather use 5.18 for full Unicode powers, but could live with 5.16 in a pinch), but the awkwardness of the combination of aggregate operators and aggregate autoderef makes me nervous.

Would you miss autoderef if it were gone? Would postfix deref make up for it?

(Later in the thread, Rafael Garcia-Suarez makes an interesting suggestion: get rid of autoderef on the polymorphic container operators. That removes one consistency—some container operators autoderef and others don't—but it removes a larger inconsistency. In an ideal world, each for arrays would be arrayeach and keys and values and so forth would be monomorphic, but that's probably never going to happen.)

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide

Categories

Pages

About this Archive

This page is an archive of entries from November 2013 listed from newest to oldest.

October 2013 is the previous archive.

December 2013 is the next archive.

Find recent content on the main index or look in the archives to find all content.


Powered by the Perl programming language

what is programming?