Count the (Possible) Bugs in this Code

One sign of a good interface is the degree to which it encourages users to do the right thing and discourages users from doing the wrong thing. While a malicious or distracted or inexperienced user may still do things incorrectly, a good interface reduces that possibility.

It's possible—even advisable—to analyze buggy code to see if the bugs point to misfeatures in the interfaces. Plenty of bugs come from misunderstandings of requirements, but plenty of bugs lurk in interfaces which are unclear or difficult to use correctly.

Consider this snippet of code:

use Moose;

has 'summary', is => 'rw';

around summary => sub
{
    my ($orig, $self) = splice @_, 0, 2;
    return $self->$orig() unless @_;
    my $summary       = shift;

    return $self->$orig() unless length $summary;

    my $content = $self->remove_markup( $summary );
    return '' unless $content;

    $self->$orig( $content );
};

This snippet of Moose-based code declares an attribute named summary with an autogenerated accessor and mutator of $object->summary. It also wraps this autogenerated method with code to sanitize any data set through the mutator by removing markup.

Everything seems straightforward, but consider all of the parts of the interface which might go wrong. (This is not a criticism of Moose, which does the best it can; it's an exploration of language and API design which just happened to come up as I debugged some very buggy code I wrote.)

my ($orig, $self) = splice @_, 0, 2;

Perl 5's minimal argument handling nevertheless has some common idioms. One of those is that $self is the first argument to all instance methods. Is this an instance method? Sort of. It's also a wrapper around an instance method which may or may not need to redelegate to the wrapped method in various ways. Thus this wrapper needs access to the wrapped method somehow.

The Moose developers chose to provide the wrapped method as a code reference passed as the first argument to the wrapper, mostly because this is the least worst approach [ 1 ]. You can't really add syntax to Perl 5 within the scope of this wrapper even with Devel::Declare without rewriting the op tree of the method itself to get at the wrapped method somehow, because Perl 5's internals know nothing about this technique. Besides that, D::D is a heavy hammer to pull out.

(Again, sometimes adding syntax to Perl itself makes otherwise intractable problems simple.)

Passing this parameter at the end of @_ would lead to the even more awkward idiom of:

around 'summary' => sub
{
    my $orig = pop;
    my $self = shift;
    ...
};

The splice approach is the least worst way I've found to deal with the situation where I have variadic argument counts to differentiate between attribute access and attribute mutation.

One problem remains: this wrapper only matters for the mutation case. I'm starting to believe that variadic redispatch within user code is a code smell of the rotting vegetation variety.

If Perl 5 supported multiple dispatch even only on the number of arguments provided to a method, this code would not be a problem. The wrapper could apply only to the case where the method took one or more arguments and would leave the accessor case of zero arguments unmodified.

The most right solution in my case is to change the name of the mutator to set_summary()... except that that's only the obvious change in the local case. For the sake of consistency with the rest of the system, I ought to change every mutator's name to set_foo: a large change. It's worth doing, but I should have done it from the start, if I'd predicted that a strict separation between setting and getting were necessary to reduce the possibility of bugs in wrapped methods.

Consider also what happens if you don't take the splice approach but do follow the common redispatching idiom:

around summary => sub
{
    my ($orig, $self, @new_values) = @_;
    return $self->$orig() if @_ == 2;

    ...

    $self->$orig( @_ );
};

I made that mistake too. It was easy enough to spot after DBIx::Class gave me odd errors about not knowing how to store a coderef in a column that should only contain text.

Again, this isn't Moose's fault. It does the best it can with the features Perl 5 provides. Yet imagine if Perl 5 knew what an invocant was and understood something about wrapping. (You can even unify redispatching to an overridden parent method in this—and while you're at it, fix the SUPER bug.) The syntax might look like:

around 'summary' => method
{
    return $self->*around::original() unless @_;
    my $summary       = shift;

    return unless length $summary;

    my $content = $self->remove_markup( $summary );
    return '' unless $content;

    $self->&around::original( $content );
};

Perhaps the star selector syntax isn't the right syntax to identify a different dispatch mechanism than simple name-based lookup, but you get the idea.

Until then, I'll be looking for an easy way to rename all of my mutators.

[1] Least worst? Consider the alternatives. If Moose provided an internal mechanism to set attributes directly (and not through direct hash access, which is prima facie the worst possible access mechanism), it would likely need access control to restrict direct mutation to declaring classes and their immediate descendents or applied roles. Figure out that system. Now strengthen it against everyone who suddenly discovers that a quick hack here and there will speed up synthetic benchmarks by a couple of percent and break the carefully constructed encapsulation, and ... well, welcome to Perl 5. (back)

5 Comments

Robert 'phaylon' Sedlacek | May 23, 2011 1:11 PM

I'm not sure what your example is trying to achieve. Is it by design that nothing will be set if only any false value is returned from the markup removal? Or that the setting with such an argument doesn't set? Or that any of these values will return an empty string from the accessor? If that's the case, it doesn't seem like a simple accessor to me. My sanitations would usually look something like this:

around foo => sub {
my ($orig, $self, @args) = @_;
@args = $self->_ensure_valid_values(@args)
if @args;
return $self->$orig(@args);
};

If I have to use such modifiers multiple times, I sometimes make a named function to generate the modifier:

sub pack_values {
my $packer = shift;
return sub {
my ($orig, $self, @args) = @_;
return scalar $packer->(@args)
if @args > 1;
return @args;
};
}

...

around foo => pack_values(sub { [@_] });

A big reason I came to love the Moose method modifiers is exactly the fact that I get passed the code reference. It's a non-magic value that leaves lots of flexibility.

Sue D. Nymme | May 23, 2011 6:32 PM

Maybe it's just me, but I always have to stop and scratch my head for a few seconds when I encounter splice().

The idiom I reach for is:
my ($orig, $self) = (shift, shift);

TMTOWTDI!

chris.prather.org | May 23, 2011 10:06 PM

I am assuming that you have simplified an idiom to illustrate a point. However in the specific example you have given I would have tried (if possible) to move the validation out of band into a TypeConstraint.

use Moose;
use Moose::Util::TypeConstraints;

subtype PlainString => as Str => where { _no_markup($_) };

coerce PlainString => from Str => via { _remove_markup($_) };

has summary => ( isa => 'PlainString', is => 'rw', coerce => 1 );

However this subtly changes the implementation of your class. You would have to deal with a type validation exception (where as in your code you return the empty string), and you would no longer have access to the instance in the remove_markup() method.

As for an easy way to rename all of your mutators I would referr you to MooseX::SemiAffordanceAccessors which does exactly what you want.

chromatic replied to comment from Sue D. Nymme | May 23, 2011 10:22 PM

That's a good idiom. I tend not to use it because something in my brain hesitates to let me use two expressions which modify @_ in the same assignment, but it does have an appeal over the splice approach.

pdcawley.myopenid.com | May 24, 2011 6:21 AM

The bug I spy is that you have a mutable object.

My rule of thumb is that everything is read only until it absolutely can't be, and then I'll add a distinct setter method. If I need to do argument munging then I'll probably call the actual setter _set_attr and then implement a separate set_attr with the munging code in it. Or I'll use a type and coercion, especially if I need the munging to happen at instantiation time as well.

Tags:

5 Comments

Modern Perl: The Book

Categories

Monthly Archives

Pages

About this Entry