State and the Syntax of Encapsulation


More and more I realize that good software design minimizes the amount of things you have to care about at any one time. Well-designed programs take advantage of abstraction possibilities of languages and libraries to model the problem and its solution in the most effective way. Well-designed languages minimize the syntactic concerns necessary to produce those abstractions.

I unsurprising news, the default Perl 5 object system shows its limits in that you have to think about Perl 5 reference syntax and objects and encapsulation, genericity, abstraction, and polymorphism all at once. Moose encourages people to do the right thing by providing abstractions that encapsulate the concerns of other levels of abstraction. Inside-out objects did something similar.

I realized this yesterday when writing about the state feature introduced in Perl 5.10. If you're a fan of minimalist languages which provide one and only one obvious way to do things, you'll hate this explanation, but at least you'll know why you're wrong.

state declares a lexical variable which maintains its state even after control flow leaves its lexical scope. In other words, these two snippets of code are almost entirely equivalent:

# the closure approach
    my $count = 0;

    sub add_user
        my ($user, %data) = @_;
        $data{user_id}    = $count++;

# the state approach
use feature 'state';

sub add_user
    state $count      = 0;

    my ($user, %data) = @_;
    $data{user_id}    = $count++;

The one potential difference is that the initialization of $count in the first example must take place before the first call to add_user().

If you're careful to avoid that tiny potential trap, you can achieve the same effect with the closure code. Scheme and Python and even Java fans rejoice for a moment. Okay, that's long enough.

The problem is that—just as with arguing that you don't need fold because you have a for loop with iteration—that line of thinking ignores the fact that the syntactic overhead necessary to make the former example work is too high. Adding a single keyword to achieve the same semantics and avoid that tiny little trap also makes the resulting code more expressive. It's more declarative.

There's nothing wrong with the goal of a language with a minimal feature set. That's a fine goal, but it can't be the most important goal, and it can't be a goal in isolation. That's because sometimes adding a feature lets you remove unnecessary scaffolding.

I believe that it's better to pursue concision than artificial simplicity in program and language design.


The problem is that—just as with arguing that you don't need fold because you have a for loop with iteration—that line of thinking ignores the fact that the syntactic overhead necessary to make the former example work is too high.

We don't even need "for" loops. So long as we have conditionals and goto, we can eliminate them entirely.

    .sub loopy
        .local int counter
        counter = 0
    LOOP:   if counter > 10 goto DONE
        print counter
        print " "
        inc counter
        goto LOOP
        print "\n"

See how elegant everything can be if you only have a handful of minimalistic, yet flexible, constructs?

Of course, I'm being sarcastic here. There's a tension between expressiveness and flexibility and it's not always easy to find the right balance. The "conditional plus goto" example is what I tend to think of whenever I hear arguments against "syntactic sugar". Sometimes that sugar turns out to be saccharine (as mentioned, it's hard to find the right balance), but when it's done properly, it's worth the effort.

In M4 I build foreach loops using tail recursion.

One problem with loops like for loops etc. and constructs like folds is that a for loop has very clear semantics in terms of sequential operation and this limits any kind of optimization. The fold on the otherwise can say that it cannot guarantee in order operation and thus require the programmer to make sure that all fold state is held by the parameters. This means you can do optimizations like spawning off parts of the fold to different CPUs etc. Then at the end just start resolving all the "lazy" parts that were waiting for an answer.

The for loop is unnecessarily strict (in more ways than one) in terms of sequential operation, that's why fold and map are really neat.

That's what really bothers me about iterator patterns, it enforces a sequential operation even though the iterator might not guarantee order of visits. If it was a map or iteration operation where a closure/function was passed then different order of evaluation could be used.

That example looks really familiar for some reason. I may have written it myself one too many times.

Your closure example doesn't have the initialization problem if you wrap that block in a BEGIN.

Yeah, it's probably your code I ripped off for that one. I just did a quick google for "loop pir perl". :)

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide



About this Entry

This page contains a single entry by chromatic published on April 23, 2010 4:32 PM.

The Thing about Volunteers and Civility was the previous entry in this blog.

From Novice to Adept: Perldoc is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Powered by the Perl programming language

what is programming?