State and the Syntax of Encapsulation

More and more I realize that good software design minimizes the amount of things you have to care about at any one time. Well-designed programs take advantage of abstraction possibilities of languages and libraries to model the problem and its solution in the most effective way. Well-designed languages minimize the syntactic concerns necessary to produce those abstractions.

I unsurprising news, the default Perl 5 object system shows its limits in that you have to think about Perl 5 reference syntax and objects and encapsulation, genericity, abstraction, and polymorphism all at once. Moose encourages people to do the right thing by providing abstractions that encapsulate the concerns of other levels of abstraction. Inside-out objects did something similar.

I realized this yesterday when writing about the state feature introduced in Perl 5.10. If you're a fan of minimalist languages which provide one and only one obvious way to do things, you'll hate this explanation, but at least you'll know why you're wrong.

state declares a lexical variable which maintains its state even after control flow leaves its lexical scope. In other words, these two snippets of code are almost entirely equivalent:


# the closure approach
{
    my $count = 0;

    sub add_user
    {
        my ($user, %data) = @_;
        $data{user_id}    = $count++;
        ...
    }
}

# the state approach
use feature 'state';

sub add_user
{
    state $count      = 0;

    my ($user, %data) = @_;
    $data{user_id}    = $count++;
    ...
}

The one potential difference is that the initialization of $count in the first example must take place before the first call to add_user().

If you're careful to avoid that tiny potential trap, you can achieve the same effect with the closure code. Scheme and Python and even Java fans rejoice for a moment. Okay, that's long enough.

The problem is that—just as with arguing that you don't need fold because you have a for loop with iteration—that line of thinking ignores the fact that the syntactic overhead necessary to make the former example work is too high. Adding a single keyword to achieve the same semantics and avoid that tiny little trap also makes the resulting code more expressive. It's more declarative.

There's nothing wrong with the goal of a language with a minimal feature set. That's a fine goal, but it can't be the most important goal, and it can't be a goal in isolation. That's because sometimes adding a feature lets you remove unnecessary scaffolding.

I believe that it's better to pursue concision than artificial simplicity in program and language design.

5 Comments

Ovid | April 24, 2010 3:42 AM

The problem is that—just as with arguing that you don't need fold because you have a for loop with iteration—that line of thinking ignores the fact that the syntactic overhead necessary to make the former example work is too high.

We don't even need "for" loops. So long as we have conditionals and goto, we can eliminate them entirely.

    .sub loopy
        .local int counter
        counter = 0
    LOOP:   if counter > 10 goto DONE
        print counter
        print " "
        inc counter
        goto LOOP
    DONE:
        print "\n"
        end
    .end

See how elegant everything can be if you only have a handful of minimalistic, yet flexible, constructs?

Of course, I'm being sarcastic here. There's a tension between expressiveness and flexibility and it's not always easy to find the right balance. The "conditional plus goto" example is what I tend to think of whenever I hear arguments against "syntactic sugar". Sometimes that sugar turns out to be saccharine (as mentioned, it's hard to find the right balance), but when it's done properly, it's worth the effort.

http://openid.aliz.es/tail-recursion | April 24, 2010 7:49 AM

In M4 I build foreach loops using tail recursion.

One problem with loops like for loops etc. and constructs like folds is that a for loop has very clear semantics in terms of sequential operation and this limits any kind of optimization. The fold on the otherwise can say that it cannot guarantee in order operation and thus require the programmer to make sure that all fold state is held by the parameters. This means you can do optimizations like spawning off parts of the fold to different CPUs etc. Then at the end just start resolving all the "lazy" parts that were waiting for an answer.

The for loop is unnecessarily strict (in more ways than one) in terms of sequential operation, that's why fold and map are really neat.

That's what really bothers me about iterator patterns, it enforces a sequential operation even though the iterator might not guarantee order of visits. If it was a map or iteration operation where a closure/function was passed then different order of evaluation could be used.

chromatic replied to comment from Ovid | April 25, 2010 1:36 AM

That example looks really familiar for some reason. I may have written it myself one too many times.

brian.d.foy.myopenid.com | April 25, 2010 8:04 AM

Your closure example doesn't have the initialization problem if you wrap that block in a BEGIN.

Ovid replied to comment from chromatic | April 25, 2010 8:37 AM

Yeah, it's probably your code I ripped off for that one. I just did a quick google for "loop pir perl". :)

Tags:

5 Comments

Modern Perl: The Book

Categories

Monthly Archives

Pages

About this Entry