2

I am quite often writing fragments of code like this:

if (exists $myHash->{$key}) {
    $value = $myHash->{$key};
}

What I am trying to do is get the value from the hash if the hash has that key in it, and at the same time I want to avoid autovivifying the hash entry if it did not already exist.

However it strikes me that this is quite inefficient: I am doing a hash lookup to find out if a key exists, and then if it did exist I am doing another hash lookup of the same key to extract it.

It gets even more inefficient in a multilevel structure:

if (exists $myHash->{$key1} 
    && exists $myHash->{$key1}{$key2} 
    && exists $myHash->{$key1}{$key2}{$key3}) {

    $value = $myHash->{$key1}{$key2}{$key3};
}

Here I am presumably doing 9 hash lookups instead of 3!

Is perl smart enough to optimize this kind of case? Or is there some other idiom to get the value of a hash without either autovivifying the entry or doing two successive lookups?

I am aware of the autovivification module, but if possible I am looking for a solution that does not require an XS module to be installed. Also I have not had a chance to try this module out and I am not completely sure what happens in the case of a multilevel hash - the pod says that this:

$h->{$key}

would return undef if the key did not exist - does that mean that this:

$h->{$key1}{$key2}

would die if $key1 did not exist, on the grounds that I am trying to de-reference undef? If so, to avoid that presumably you would still need to do multi-level tests for existence.

harmic
  • 28,606
  • 5
  • 67
  • 91

2 Answers2

2

I wouldn't worry about optimization since hash lookups are fast. But for your first case, you can do:

if (my $v = $hash{$key}) {
    print "have $key => $v\n";
}

Similarly:

if ( ($v = $hash{key1}) && ($v = $v->{key2}) ) { 
    print "Got $v\n";
}
perreal
  • 94,503
  • 21
  • 155
  • 181
  • 1
    That's surprising - I had thought that any lvalue use of a non-existent hash entry would add the key to the hash (with value undef in this case). Clearly I am wrong. Amazing I did not pick that up after years of writing perl code! – harmic Jul 25 '14 at 06:15
  • autovivification only happens in nested accesses – perreal Jul 25 '14 at 06:18
  • 1
    @harmic, yes, if the hash entry is an lvalue, it will be autovivified, but you may be misunderstanding the meaning of lvalue. In this case the hash is on the right-hand side of the assignment; this makes it an rvalue, not an lvalue. Autovivification of rvalue hash entries, as @perreal says, only happens with nested accesses - e.g. `$h->{foo}{bar}{baz}` will automatically do something like `$h->{foo}{bar} //= {}`. – tobyink Jul 25 '14 at 08:37
  • @tobyink oops sorry, that was a typo, I meant rvalue – harmic Jul 25 '14 at 08:48
  • 1
    @perreal: this way you're not testing if the hash element _exists_, as the OP requested, you're testing if `$hash{$key}` is _true_, which is a different (stricter) test of course. – emazep Jul 25 '14 at 16:30
2

Autovivification doesn't happen for single-level access so you can safely write

my $value = $hash{$key};

For multi-level access intermediate entries will be autovivified. e.g.

my $value = $hash{a}{b};

will create a reference to an empty hash if $hash{a} doesn't already exist. (If it does exist and isn't a hash reference, perl will throw an error and die.) To avoid that, you need to check each level first. You can write a subroutine to check existence of arbitrarily nested keys.

sub safe_exists {
    my $x = shift;
    foreach my $k (@_) {
        no warnings 'uninitialized';
        return unless ref $x eq ref {};
        return unless exists $x->{$k};
        $x = $x->{$k};
    }
    return 1;
}

if (safe_exists(\%hash, qw(a b))) {...}

Depending on your algorithm (and why you're trying to avoid autovivification) locking your hash can be a useful alternative to no autovivification or multi-layer exists tests.

use Hash::Util;

my %hash = (a => { b => 1 });
Hash::Util::lock_hash_recurse(%hash);

say $h{a}{b}; # 1
say $h{a}{c}; # error!

I mostly use this as a way to detect programming errors when working with complex data structures. It's useful for detecting mis-typed key names or inadvertent modification of values.

Michael Carman
  • 30,628
  • 10
  • 74
  • 122