Perl Type Checks versus Encapsulation

| 4 Comments

You code defensively. You like contracts and preconditions. You like to ensure that people don't misuse your API accidentally. You warn early, and you warn often.

This is all well and good, except.... Suppose you have an API which takes a hash reference. You want to retrieve something from an optional cache, or calculate a default value. You write:

use Modern::Perl;
use Carp 'croak';

sub retrieve_or_calculate
{
    my ($key, $cache) = @_;

    # buggy; do not use
    croak "callee is buggy; I need a hash ref!"
        unless ref( $cache ) eq 'HASH';

    $cache->{$key} //= calculate_expensive_value($key);

    return $cache->{key};
}

This code looks good, except that it's buggy. (If you've read Is It, Can It, Does It, and Robust Perl 5 OO you already know why.)

The function expects a hash reference. This is just as much a hash reference as anything else:

my $cache = bless {}, 'SecretMonkey::Cache';

... but try to pass it into that function. Boom. It's indeed a hash reference, but it won't pass the ref() test.

That's fine. There are ways around that. Pull out the big bag of "Features that (mostly) should be in the core but aren't", spelled Scalar::Util, specifically reftype():

use Modern::Perl;
use Carp 'croak';
use Scalar::Util 'reftype';

sub retrieve_or_calculate
{
    my ($key, $cache) = @_;

    # buggy; do not use
    croak "callee is buggy; I need a hash ref!"
        unless reftype( $cache ) eq 'HASH';

    $cache->{$key} //= calculate_expensive_value($key);

    return $cache->{key};
}

This catches the case of a blessed hash reference, but now it throws warnings because reftype() returns undef when passed a non-reference. That's easy to fix:

use Modern::Perl;
use Carp 'croak';
use Scalar::Util 'reftype';

sub retrieve_or_calculate
{
    my ($key, $cache) = @_;

    # buggy; do not use
    croak "callee is buggy; I need a hash ref!"
        unless ( reftype( $cache ) // '' ) eq 'HASH';

    $cache->{$key} //= calculate_expensive_value($key);

    return $cache->{key};
}

... except that that gets into "Wow, that's kind of ugly!" territory, and Devel::Cover doesn't like it. (Any time an ancillary tool suffers from an API design decision and you have to make a choice between penalizing otherwise good and reliable metrics and working around API deficiencies, you should... aw, forget it. Something in the DarkPAN probably relies on exactly this behavior. Maybe.)

Of course, poking directly into the internals of an object is wrong. That's the internal representation of an object. Fortunately, you can protect against this behavior by changing the representation of SecretMonkey::Cache:

package SecretMonkey::Cache;

use overload '%{}' => \&treat_as_hash;

sub treat_as_hash { ... }

sub new
{
    bless [], shift;
}

Now the $cache object behaves as a hash (thanks to overload), except that reftype() still pokes in the guts of the object to discover a blessed array reference, not a blessed hash reference. Oops.

Some people pull out UNIVERSAL::isa() here and they are wrong. Many of them balk at using reftype() because it's in a separate (but core) module, because its API has that nasty undef quirk, or because they don't know it exists. This buggy code has the same problem with regard to hash dereference overloading:

use Modern::Perl;
use Carp 'croak';

sub retrieve_or_calculate
{
    my ($key, $cache) = @_;

    # buggy; do not use
    croak "callee is buggy; I need a hash ref!"
        unless UNIVERSAL::isa( $cache, 'HASH' );

    $cache->{$key} //= calculate_expensive_value($key);

    return $cache->{key};
}

One option is to override the isa() method within SecretMonkey::Cache:

package SecretMonkey::Cache;

sub isa
{
    my ($self, $class) = @_;
    return 1 if $class eq 'Hash';
    return $self->SUPER::isa($class);
}

... but that fails when people use ref() or reftype() or UNIVERSAL::isa() as a function, not a method. The checking code could change to work around this — in truth, it must:

use Modern::Perl;
use Carp 'croak';

sub retrieve_or_calculate
{
    my ($key, $cache) = @_;

    # buggy; do not use
    croak "callee is buggy; I need a hash ref!"
        unless $cache->isa( 'HASH' );

    $cache->{$key} //= calculate_expensive_value($key);

    return $cache->{key};
}

... except that it's okay if $cache isn't a blessed reference. In fact, the intent of the code is not to handle blessed references. Passing in an array reference should cause the "I want a hash ref!" exception just as much as passing in an object that can't behave as a hash ref should.

If you're willing to go the autobox::Core route, you can use UNIVERSAL::DOES() instead (see The Why of Perl Roles), provided that the core data types do override DOES() appropriately and that overload sets it by default (which it should) and....

Ultimately that's the right solution, but Perl 5 has littered its glorious history with the detritus of several mostly not entirely wrong half-solutions which permeate both the CPAN and the DarkPAN with poorly understood traps for people trying to design, maintain, and write perfectly valid and theoretically correct code, and ... well, if you put scare quotes around the word "working" when you say "Don't break working code!", the irony police probably won't haul you away, even if they should.

The point is simple. The question this API really should be asking is not some question of "Is this a hash reference?" or "Is it a blessed hash reference?" or "Is it a blessed reference with hashlike overloading?" or "Is it a tied variable that understands how to perform a keyed fetch operation?". The question this API really needs to ask is "Can I treat this like a hash?" The game is lost at the point where the API cares what $cache is; all that matters is what $cache does.

You could probably get some combination of all of these checks working with:

use Modern::Perl;

use overload;

use Carp         'croak';
use Scalar::Util qw( blessed reftype );

sub retrieve_or_calculate
{
    my ($key, $cache) = @_;

    my $type_check    = 0;

    # it's an object
    $type_check = 1
        if (blessed($cache)
        &&  ($cache->does( 'HASH' )
        ||  reftype( $cache ) eq 'HASH' 
        ||  overload::Method( $cache, '@{}' )));

    $type_check = 1 if ref( $cache ) eq 'Hash';

    croak croak "callee is buggy; I need a hash ref!" unless $type_check;

    ...
}

... but ouch.

Fortunately, Perl 5 has a perfectly good way to attempt and recover from an operation that may not succeed:

use Modern::Perl;
use Carp 'croak';

sub retrieve_or_calculate
{
    my ($key, $cache) = @_;

    eval
    {
        $cache->{$key} //= calculate_expensive_value($key);

        return $cache->{key};
    }

    croak "callee is buggy; I need a hash ref!" if $@;
}

Yes, you may need to check $@ for the specific exception thrown, but that's much more straightforward, much simpler, and much, much easier to get right.

That leaves only the simple matter of fixing existing code, teaching current and future Perl programmers why not to use the obvious-but-wrong approaches, and enhancing Perl so that it's easier to do the right thing than the wrong thing. (Try Rakudo today!)

Good luck.

4 Comments

What about my Scalar::Util::Reftype (apart from the overload check)?

foo() if ! reftype( $bar )->hash;
# or even
foo() if ! reftype( $bar )->hash_object;

http://search.cpan.org/~burak/Scalar-Util-Reftype/

That purported solution still has a problem.

Many objects are blessed hashes. However, most of the time, that is an implementation detail rather than an advertised part of the API. But sometimes, it is part of the advertised interface, as in Hash::MultiValue.

So the question you should ask is not “can I treat this like a hash?” but “am I supposed to treat this like a hash?”. The former is, of course, a necessary precondition for the latter – but not a sufficient one.

And blithely using eval does not ask the correct question.

The only way to get this particular bugbear right at least in Perl 5 is using ref and UNIVERSAL::ref.

In that last example, the return statement will return from the eval block, not from the retrieve_or_calculate sub.

I don't see a bug in your first code, but see a lot other bugs later.
At first. If i want a hashref then i want a hashref. And something like

my $cache = bless {}, 'SecretMonkey::Cache';

It is not a hashref. It is an object. Sure it based on a hashref, but who says that it based on a hashref tommorow?

One of the fundamentals of OOP is that the "internal storage" can change. That is one reason why you never should do "$obj->{whatever}" Because tommor the internal structure can change and it can break your code.

And absolutely the same goes for a checking of a hashref. If i want an hashref then i don't want an object where the internal storage can change.

If you work with objects like this, then your code is buggy and can easily break if the internal storage of the object change. And the first solution and check on a hashref can warn you about such bad code.

You should replace "use Modern::Perl" through "use Old::Perl::Please::Dont::Use::This". By the way one reason why i dislike "Modern::Perl". if you use it, doesn't mean a programmer really writers Modern Perl. But that is a other topic.

Modern Perl: The Book

cover image for Modern Perl: the book

The best Perl Programmers read Modern Perl: The Book.

sponsored by the How to Make a Smoothie guide

Categories

Pages

About this Entry

This page contains a single entry by chromatic published on January 6, 2010 3:03 PM.

UNIVERSAL and API Decisions was the previous entry in this blog.

Genericity, Serendipity, Surety is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.


Powered by the Perl programming language

what is programming?