Does merging 2 or more Perl hash references consume more or less twice the memory?

Question

Given the following code, does the hash referenced by $z consume the same memory as that used by ( %$x, %$y), more or less?

If so, is there a way to use a single reference to call data from the hashes referenced by either $x or $y like $z->{$somekeytoXorY} without affecting performance and memory?

use strict;
use warnings;

my $x = {
    1 => 'a',
    2 => 'b',
};

my $y = {
    3 => 'c',
    4 => 'd',
};

my $z = {
    %$x, %$y
};

Update

The hash references actually point to large hashes created using tie and DB_File.

I was wondering whether there is a chance I could use just a single hash to these so that I don't need to dump everything in memory. Also I may be using more than two of these at once.

*"without affecting performance and memory"* Are you having problems with performance, or with memory? If so then you should *profile* your code to discover where the bottlenecks are, and focus on that code. If not then you are trying to optimise needlessly. The days are long gone when memory capacity or CPU power were so expensive that it was cost-effective to employ a programmer to reduce them. If you can save 64GB of memory, or reduce seconds of run time to milliseconds, then go ahead. But a different choice of data structure doesn't generally award such benefits. — Borodin, Sep 03 '18 at 20:51
*"The hash references actually point to large hashes created using `tie` and `DB_File`"* Then they're not using Perl hashes at all: you're simply using an API that is sufficiently similar to a hash access that the calls can usefully be made by dummy hash operations. If you check the `tied` hash then you will probably see that it remains empty. The memory usage is highly dependent on the way the `tie` has been written, and you shouldn't even consider copying hashes with constructs like `{ %$_ }`. This will create an "ordinary" hash and break the `tied` interface, so your code will be worthless. — Borodin, Sep 03 '18 at 20:57
*"I was wondering whether [I really need to] dump everything in memory"* Copying data from a `tied` hash to memory is counter-productive. Some things may still work, but you have lost the `tie` functionality as soon as you do that. You can *always* use `$x->{key} // $y->{key}` (or `exists $x->{key} ? $x->{key} : $y->{key}` as **JGNI** suggests) unless you need to account for an element appearing in both hashes. You're trying to be too smart, and fixing problems that you haven't encountered yet. — Borodin, Sep 03 '18 at 21:08
Tied hashes aren't hashes at all. They are interfaces to subroutines. Since they're code rather than data, talking about the memory and performance of tied hashes in general makes no sense. — ikegami, Sep 03 '18 at 23:33

ikegami · Accepted Answer · 2018-09-03T23:52:32.917

Tied hashes aren't hashes at all. They are interfaces to subroutines. Since they're code rather than data, talking about the memory and performance of tied hashes in general makes no sense.

Let's talk about ordinary hashes first.

$z = { %$x, %$y }; will copy the scalars of %$x and %$y into %$z, so yes, it will take twice the memory (assuming no duplicate keys).

You could share the scalars:

use Data::Alias qw( alias );
my $z = {};
alias $z->{$_} = $x->{$_} for keys(%$x);
alias $z->{$_} = $y->{$_} for keys(%$y);

You'd still use memory proportional to the number of the elements in both hash, but it would be far less than before if %$x and %$y are actually hashes. This might not save any memory for tied hashes.

The alternative is not to actually merge the data at all. You could use a tied hash yourself...

package Tie::MergedHashes;
use Carp qw( croak );
sub new     { my $pkg = shift; $pkg->TIEHASH(@_); }
sub TIEHASH { bless [ @_ ], $_[0] }
sub STORE   { croak("Not allowed"); }
sub FETCH   { for (@{$_[0]}) { return $_->{$_[1]} if exists($_->{$_[1]}); } return undef; }
...

my $z = {};
tie %$z, MergedHashes => ($y, $x);
$z->{$key}

...but there's no reason to make the code look like hash. You could simply use an object.

package MergedHashes;
use Carp qw( croak );
sub new   { bless [ @_ ], $_[0] }
sub fetch { for (@{$_[0]}) { return $_->{$_[1]} if exists($_->{$_[1]}); } return undef; }
...

my $z = MergedHashes->new($y, $x);
$z->fetch($key)

Thanks for the object suggestion. I originally thought of just using a for loop to check where the data exists but this object approach is more readable. — criz, Sep 05 '18 at 04:37
The down side is that sub/method calls are rather expensive. — ikegami, Sep 05 '18 at 08:14

score 0 · Answer 2 · edited Sep 03 '18 at 23:27

The simple answer is yes.

If you want all the keys from both hash refs without creating a new hashref, and assuming that you have no duplicated keys you could do something like.

use List::Util qw( uniq );

my @keys = uniq( keys( %$x ), keys( %$y ) );

Then to get the values from either array something like

my $value = exists $y->{$key} ? $y->{$key} : $x->{$key};

By the way why are you using hash refs rather than hashes and why is memory such a consideration that you needed to ask this question?

Does merging 2 or more Perl hash references consume more or less twice the memory?

Update

2 Answers2