How to find max value grouped by multiple keys in array of hashes?

Question

Have data that has this kind of structure. Will be in ascending order by 'c'.

[ { 'a' => 1, 'b' => 1, 'c' =>  1, 'd' => '?' },
  { 'a' => 1, 'b' => 1, 'c' =>  2, 'd' => '?' },
  { 'a' => 1, 'b' => 1, 'c' =>  3, 'd' => '?' },
  { 'a' => 1, 'b' => 2, 'c' =>  4, 'd' => '?' },
  { 'a' => 1, 'b' => 2, 'c' =>  5, 'd' => '?' },
  { 'a' => 2, 'b' => 1, 'c' =>  6, 'd' => '?' },
  { 'a' => 2, 'b' => 1, 'c' =>  7, 'd' => '?' },
  { 'a' => 2, 'b' => 1, 'c' =>  8, 'd' => '?' },
  { 'a' => 2, 'b' => 2, 'c' =>  9, 'd' => '?' },
  { 'a' => 2, 'b' => 2, 'c' => 10, 'd' => '?' } ]

Want array of the max value of 'c' grouped by each unique combination of 'a' and 'b'.

[ { 'a' => 1, 'b' => 1, 'c' =>  3, 'd' => '?' },
  { 'a' => 1, 'b' => 2, 'c' =>  5, 'd' => '?' },
  { 'a' => 2, 'b' => 1, 'c' =>  8, 'd' => '?' },
  { 'a' => 2, 'b' => 2, 'c' => 10, 'd' => '?' } ]

The other keys need to be retained but are not otherwise related to the transformation. The best I could figure out so far is to reverse the array (thus descending ordered by 'c'), uniq by 'a' an 'b', and reverse array again. But I am depending on the implementation of uniq_by always returning the first unique item found. The specification doesn't say that, so I am worried about relying on that behavior since it could change in future versions. Also wondering if this may be a really inefficient method.

@data.reverse!.uniq!{|record| [record['a'],record['b']]}.reverse!

Is there a better and more efficient way to do this? If you do have a better way, can you also please explain it instead of just giving me a super nasty one-liner that I may not be able to decipher.

Michael Kohl · Accepted Answer · 2012-05-16T15:54:05.213

12

That's actually fairly easy:

a.group_by { |h| h.values_at("a", "b") }.map { |_, v| v.max_by { |h| h["c"] } }

Or with nicer formatting:

a.group_by do |h|
  h.values_at("a", "b") 
end.map do |_, v| 
  v.max_by { |h| h["c"] }
end

Explanation: first we use Enumerable#group_by to create a Hash with the combinations of "a" and "b" (extracted with Hash#values_at) as the keys and all hashes with that combination as the values. We then map over this hash, ignore the keys and select the element with the maximum value for "c" from the array with Enumerable#max_by.

edited May 16 '12 at 15:54

answered May 16 '12 at 15:38

Michael Kohl

66,324
14
138
158

Could you explain or give some reference to the meaning of `_` in block parameters? – Flexoid May 16 '12 at 15:40
3

@Flexoid: No special meaning, it's a parameter I don't care about and in a lot of languages it's customary to use an underscore for the name to signify that. – Michael Kohl May 16 '12 at 15:43
@steenslag Somehow the `"c"` became a `v` and it took me a second to figure out where I was being stupid ;-) Rereading my textual description helped, because I described it properly... – Michael Kohl May 16 '12 at 15:51
+1 for a very idiomatic solution, especially using `values_at` (which I would have failed to do). Though I did prefer @steenslag's `.last` instead of the `_,v` desplat. Or better yet, `a.group_by{...}.values.map{...}` – Phrogz May 16 '12 at 15:55
3

`_` does have a special meaning or at least [gets special treatment](http://stackoverflow.com/a/9560198/479863) in some cases, the convention of using `_` as the *I don't care* parameter has hard-wired support in the interpreter. That's just nit picking a comment though :) – mu is too short May 16 '12 at 16:23

How to find max value grouped by multiple keys in array of hashes?

1 Answers1

Linked