What's the most efficient way to check multiple hash references in perl

Question

I have a multidimensional data structure for tracking different characteristics of files I am comparing and merging data for. The structure is set up as such:

$cumulative{$slice} = {
    DATA    => $data,
    META    => $got_meta,
    RECOVER => $recover,
    DISPO   => $dispo,
    DIR     => $dir,
};

All of the keys, save DIR (which is just a simple string), are references to hashes, or arrays. I would like to have a simple search for KEYS that match "BASE" for the value DIR points to for each of the $slice keys. My initial thought was to use grep, but I'm not sure how to do that. I thought something like this would be ok:

my (@base_slices) = grep { $cumulative{$_}->{DIR} eq "BASE" } @{$cumulative{$_}};

I was wrong. Is there a way to do this without a loop, or is that pretty much the only way to check those values? Thanks!

Edit: Thanks to Ikegami for answering succinctly, even without my fully representing the outcome of the search. I have changed the question a little bit to more clearly explain the issue I was having.

@zdim They are all hash, or array references of various lengths and values. Yes, I would like to know if there is a more efficient way than looping through all of the values of `$slice` in the hash to determine which `{$slice}->{DIR}` values match what I'm looking for. Thanks again — WetCheerios, Nov 07 '22 at 17:24
(It's awkward and ineffective talking through elaborate comments so I removed them and posted an answer instead. Please see and comment...) — zdim, Nov 07 '22 at 17:53

zdim · Answer 1 · 2022-11-09T18:02:09.097

3

This was posted for the initial form of the question, before the edit, and reflects what I did and/or did not understand in that formulation.

The use of @{$cummulative{$_}}, with $_ presumably standing for $slice, indicates that the value for key $slice is expected to be an arrayref. However, the question shows there to be a hashref. This is either an error or the question mis-represents the problem.

If the expression in grep accurately represents the problem, for values of $slice that are given or can be built at will, then just feed that list of $slice values to the shown grep

my @base_slices = grep { $cumululative{$_}{DIR} eq 'BASE' } @slice_vals;

or

my @base_slices = 
    grep { $cumululative{$_}{DIR} eq 'BASE' } 
    map { generate_list_of_slice_values($_) } 
    LIST-OF-INPUTS;

That generate_list_of_slice_values() stands for whatever way the values for $slice get acquired dynamically from some input.^†

There is no need for a dereferencing arrow for the key DIR (a syntax convenience), and no need for parenthesis around @base_slices since having an array already provides the needed list context.

Please clarify what $slice is meant to be and I'll update.

^† The code in map's block gets elements of LIST-OF-INPUTS one at a time (as $_) and whatever it evaluates with each is joined into its return list. That is passed to grep for filtering: elements of its input list are provided to the code in the block one at a time as $_ and those for which the code evaluates to "true" (in Perl's sense) pass, forming the grep's return list.

edited Nov 09 '22 at 18:02

answered Nov 07 '22 at 17:49

zdim

64,580
5
52
81

$slice is literally the filename I am parsing for the data structure. I have a list of them in an array, which I have subsequently remembered, and now this question is just, hopefully, an exercise in education. – WetCheerios Nov 07 '22 at 17:50
@WetCheerios Ah, there -- so you feed `@base_wafs` to your `grep`, like in the first example above. But then I'm not sure what that string `'BASE'` stands for ... is that a possible filename? Or do youo need to parse that file to extract a string which is tested against `BASE`? (Then that would be done in the `map` in the second example.) – zdim Nov 07 '22 at 17:53
Can you please update the `map` example to include `BASE` as the search criteria? I apologize, but I do not use `map` often, and need to start, rather than relying so heavily on `grep` – WetCheerios Nov 07 '22 at 17:55
OK, can you then explain how that `BASE` need be used? (1) `grep` is a filter -- feed it a list, the code in the `{}` gets an item at a time (in `$_`) and if that code evaluates to true that item passes; the filtered list is returned (2) `map` on the other hand merely transforms an input list -- feed it a list and it computes with each element at a time (again available in `$_`) and returns the new list. /// So an often used way is to feed input to a `map`, which transforms it into another list, which is fed into `grep` to filter what's needed. This is my second example. – zdim Nov 07 '22 at 18:04
[cont'd] Now, what of that do you need? How does `BASE` string figure into all that? – zdim Nov 07 '22 at 18:04
@ikegami explained it in a way that even I could understand it. When you said when I had put in an array reference to search, rather than the keys for `%cummulative`, it did not click with me. Thanks for answering so promptly, and accurately. The issue with my understanding is not your fault at all. Sometimes, I just need more words to get the point. – WetCheerios Nov 07 '22 at 19:27
@WetCheerios Thank you for going to trouble to explain that -- and great that you now figured out your problem! My initial "explanation" just wasn't clear, because I didn't exactly get what was going on in the question (I still don't quite :). I cleaned up that initial statement. – zdim Nov 08 '22 at 07:45

ikegami · Answer 2 · 2022-11-07T19:19:19.233

3

This is wrong:

@{$cumulative{$slice}}

It gets the value of the array referenced by $cumulative{$slice}. But $cumulative{$slice} is not a reference to an array; it's a reference to a hash. This expression makes no sense, as results in the error

Not an ARRAY reference

What would be correct? Well, it's not quite clear what you want.

Maybe you want the keys of the elements of %cumulative whose DIR attribute equal BASE.

my @matching_keys =                                # 3. Save the results.
   grep { $cumulative{ $_ }->{ DIR } eq "BASE" }   # 2. Filter them.
      keys( %cumulative );                         # 1. Get the keys.

(The -> is optional between indexes, so $cumulative{ $_ }{ DIR } is also fine.)

Maybe you don't need the keys. Maybe you want the values of the elements of %cumulative whose DIR attribute equal BASE.

my @matching_values =                              # 3. Save the results.
   grep { $_->{ DIR } eq "BASE" }                  # 2. Filter them.
      values( %cumulative );                       # 1. Get the values.

edited Nov 07 '22 at 19:19

answered Nov 07 '22 at 18:02

ikegami

367,544
15
269
518

Thank you sir. I should have been able to figure that out. sometimes I still psych myself out with data structures and don't look at them from the perspective of what I'm actually looking at. This makes a lot of sense. – WetCheerios Nov 07 '22 at 19:12
If you want a hash, you can use the keys you now have to build a new (smaller) hash (`my %filtered = map { $_ => $cumulative{ $_ } } @matching_keys;`) . Or you could delete the elements that *don't* match from the existing hash (`delete @cumulative{ grep { $cumulative{ $_ }->{ DIR } ne "BASE" } keys( %cumulative );`). – ikegami Nov 07 '22 at 19:14
I do not need to disturb the cummulative data, I just want to reduce the amount of data I am parsing in the subroutine that only needs BASE data to evaluate. So, I pass the entire data structure to the sub, then I will get the list of `BASE` keys and evaluate from there. Thanks again! – WetCheerios Nov 07 '22 at 19:17

What's the most efficient way to check multiple hash references in perl

2 Answers2