7

Lets say I have a large hash and I want to iterate over the contents of it contents. The standard idiom would be something like this:

while(($key, $value) = each(%{$hash_ref})){
   ///do something
}

However, if I understand my perl correctly this is actually doing two things. First the

%{$hash_ref}

is translating the ref into list context. Thus returning something like

(key1, value1, key2, value2, key3, value3 etc)

which will be stored in my stacks memory. Then the each method will run, eating the first two values in memory (key1 & value1) and returning them to my while loop to process.

If my understanding of this is right that means that I have effectively copied my entire hash into my stacks memory only to iterate over the new copy, which could be expensive for a large hash, due to the expense of iterating over the array twice, but also due to potential cache hits if both hashes can't be held in memory at once. It seems pretty inefficient. I'm wondering if this is what really happens, or if I'm either misunderstanding the actual behavior or the compiler optimizes away the inefficiency for me?

Follow up questions, assuming I am correct about the standard behavior.

  1. Is there a syntax to avoid copying of the hash by iterating over it values in the original hash? If not for a hash is there one for the simpler array?

  2. Does this mean that in the above example I could get inconsistent values between the copy of my hash and my actual hash if I modify the hash_ref content within my loop; resulting in $value having a different value then $hash_ref->($key)?

Grant McLean
  • 6,898
  • 1
  • 21
  • 37
dsollen
  • 6,046
  • 6
  • 43
  • 84
  • 1
    Dereferencing a reference doesn't make a copy of the original data; that would defeat the purpose of using references in the first place. You may be thinking of `$foo = { %hash }`, which creates an anonymous hash using a copy of the data in `%hash`. – ThisSuitIsBlackNot Jan 06 '16 at 00:13
  • 1
    Note that it is more common to use `keys` and `values` than `each`, because with the last there can be problems with terminating a loop before the end, and iterating over one hash in two places simultaneously. – Borodin Jan 06 '16 at 03:45
  • @Borodin Thank you for mentioning this. I had anticipated that adding or removing anything to a hash would be dangerous, but I had presumed that Perl would protect me otherwise. I hadn't realized it wasn't re-entrant until you inspired me to research that online. Though it seems Keys has the copying problem I was looking to avoid with Each – dsollen Jan 06 '16 at 19:41
  • @ThisSuitIsBlackNot You are correct, though this is due to my poorly worded question. My concern about list context leading to a copy still applies, but I wrote it in a way that made it seem like I thought the dereferencing caused the list context rather then the each method. Or perhaps more accurately I didn't take the time to think about when list context occurred specifically, only that I expected it to occur by the time it hit the subroutine (since I had through each was a subroutine until ikegami's answer) – dsollen Jan 06 '16 at 19:43

2 Answers2

6

No, the syntax you quote does not create a copy.

This expression:

%{$hash_ref}

is exactly equivalent to:

%$hash_ref

and assuming the $hash_ref scalar variable does indeed contain a reference to a hash, then adding the % on the front is simply 'dereferencing' the reference - i.e. it resolves to a value that represents the underlying hash (the thing that $hash_ref was pointing to).

If you look at the documentation for the each function, you'll see that it expects a hash as an argument. Putting the % on the front is how you provide a hash when what you have is a hashref.

If you wrote your own subroutine and passed a hash to it like this:

my_sub(%$hash_ref);

then on some level you could say that the hash had been 'copied', since inside the subroutine the special @_ array would contain a list of all the key/value pairs from the hash. However even in that case, the elements of @_ are actually aliases for the keys and values. You'd only actually get a copy if you did something like: my @args = @_.

Perl's builtin each function is declared with the prototype '+' which effectively coerces a hash (or array) argument into a reference to the underlying data structure.

As an aside, starting with version 5.14, the each function can also take a reference to a hash. So instead of:

($key, $value) = each(%{$hash_ref})

You can simply say:

($key, $value) = each($hash_ref)
Grant McLean
  • 6,898
  • 1
  • 21
  • 37
  • I had no idea about your comment starting "as of 5.14..." that's good stuff. I need to do a better job of reading release notes. – Hambone Jan 06 '16 at 02:00
  • 5
    @Hambone `each $ref` was introduced as an *experimental* feature in 5.14, subject to later revision or removal. As of 5.24 it *will* be removed, as it turned out to be quite troublesome. In its place is the `postderef` feature, introduced as experimental in 5.20 and due to be upgraded to non-experimental (with no change in syntax or behavior) in 5.24. – hobbs Jan 06 '16 at 04:39
1

No copy is created by each (though you do copy the returned values into $key and $value through assignment). The hash itself is passed to each.

each is a little special. It supports the following syntaxes:

each HASH
each ARRAY

As you can see, it doesn't accept an arbitrary expression. (That would be each EXPR or each LIST). The reason for that is to allow each(%foo) to pass the hash %foo itself to each rather than evaluating it in list context. each can do that because it's an operator, and operators can have their own parsing rules. However, you can do something similar with the \% prototype.

use Data::Dumper;

sub f     { print(Dumper(@_)); }
sub g(\%) { print(Dumper(@_)); }   # Similar to each

my %h = (a=>1, b=>2);
f(%h);  # Evaluates %h in list context.
print("\n");
g(%h);  # Passes a reference to %h.

Output:

$VAR1 = 'a';           # 4 args, the keys and values of the hash
$VAR2 = 1;
$VAR3 = 'b';
$VAR4 = 2;

$VAR1 = {              # 1 arg, a reference to the hash
          'a' => 1,
          'b' => 2
        };

%{$h_ref} is the same as %h, so all of the above applies to %{$h_ref} too.


Note that the hash isn't copied even if it is flattened. The keys are "copied", but the values are returned directly.

use Data::Dumper;
my %h = (abc=>"def", ghi=>"jkl");
print(Dumper(\%h));
$_ = uc($_) for %h;
print(Dumper(\%h));

Output:

$VAR1 = {
          'abc' => 'def',
          'ghi' => 'jkl'
        };
$VAR1 = {
          'abc' => 'DEF',
          'ghi' => 'JKL'
        };

You can read more about this here.

Community
  • 1
  • 1
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • Thank you. I feel this is a better explanation then the above because it better addresses why they list context I expected to be created didn't occur; though admittedly If I had better asked the question it may have been clearer that my confusion was due to my expecting each to take the list context. To clarify though, your saying that each is not a subroutine as I assumed? Is the ability to avoid list context strictly limited to operators written by perl then? I know I can override operators, but only with a subroutine which requires a ref or list context correct? – dsollen Jan 06 '16 at 19:33
  • 1
    This is the first line of [perlfunc](http://perldoc.perl.org/perlfunc.html): "*The functions in this section can serve as terms in an expression. They fall into two major categories: list operators and named unary operators.*" Some of the functions can't be replicated by subs, some are indistinguishable from subs with prototypes, and there might be some that are indistinguishable from ordinary subs, but all are implemented as operators. – ikegami Jan 06 '16 at 19:45
  • 1
    No, subs can control how their callers are parsed too through prototypes (as I showed). You can also use features such as `cv_set_call_parser` to extend the parser. For example, [`use syntax qw( loop );`](http://search.cpan.org/perldoc?Syntax::Feature::Loop) uses this. – ikegami Jan 06 '16 at 19:46