When is a Hash the Same?

Suppose you have two hashes which (when you inspect them) contain:

%first = (
    dog => 'Rodney',
    cat => [qw( Daisy Jack )],
);

%second = (
    dog => 'Rodney',
    cat => [qw( Daisy Jack )],
);

Are they the same?

Your answer depends on what identity means to you. At this point in time, they have what appear to be identical contents. They're probably different variables, though. If you look at things from the point of view of the perl program itself, the two names %first and %second refer to two different HV structs.

That's not very interesting to most users, I suspect. The questions of how these two hashes behave when you treat them the same way is more interesting. For example, given the state of the two hashes as demonstrated earlier, do you expect this test to pass?

use Test::More;

my @first_keys  = keys %first;
my @second_keys = keys %second;

is_deeply \@first_keys, \@second_keys, 'keys returned in the same order';
done_testing;

The official answer is "No, you should not." As of Perl 5.18, the official answer is "If this test ever passes, it's an unlikely coincidence."

A hash map data structure (what Perl calls "hashes") associates keys with values by calculating a simple hash of the key and using that as a cheap index into another data structure. That calculated hash is not necessarily unique, and the indexed data structure needs to be able to handle collisions of the calculated hash (even though the keys are different) as well as insertions of new key/value pairs and removals of key/value pairs.

When you traverse the Perl hash with keys or values or each, perl traverses that data structure in a particular order that makes sense given the state of that data structure. If and when that data structure changes, the way perl traverses it will change. The details of that data structure are hidden inside perl; they're not obvious from the point of view of the Perl programmer.

The implication of this is not obvious at first, but it's easy to explain: even if two hashes have the same keys and the same values, the order in which you traverse that hash with keys or values or each may be different between the hashes.

After all, you don't know whether %first was defined all at once, or whether it started empty and had the two key/value pairs added one at a time or both at a time. You don't know whether %second originally had three key/value pairs or four or ten. You can't look at what a hash currently contains and guess at the details of the internal data structure perl uses to represent those keys and values.

As of Perl 5.17.10—what will become Perl 5.18 very soon—this behavior gets a little more predictable. With a recent patch to bleadperl, the calculated hash of two hashes defined with the same keys and values in the same way will be slightly different. In other words, even if you write:

my %first = (
    dog => 'Rodney',
    cat => [qw( Daisy Jack )],
);

my %second = (
    dog => 'Rodney',
    cat => [qw( Daisy Jack )],
);

... the test from earlier may still not pass. (It's very unlikely to pass.) Any code that relies on an accident of implementation which causes that test to pass occasionally is, sadly, buggy code.

The good news is that you almost never need to rely on that behavior which happens by accident. When I come across code which relies on a specific ordering of keys or values of a hash, it's often in tests which care more that the contents of the hashes are equivalent than that the order of traversal is the same between the hashes. (In those cases, you can use a function like Test::More's is_deeply to compare keys or values or both, or sort the keys before comparing them.

In other cases, you must rely on the documented guarantee that using keys and values on the same hash (not a hash with the same contents but a hash which uses the same underlying HV) will always traverse that data structure in the same order until you modify the hash.

Anything else has the potential to do the wrong thing depending on the implementation. Perl 5.18 will be a little more strict about this, which is actually a good thing, as it will identify buggy code much more easily.

Tags:

Modern Perl: The Book

Categories

Monthly Archives

Pages

About this Entry