4

Have data that has this kind of structure. Will be in ascending order by 'c'.

[ { 'a' => 1, 'b' => 1, 'c' =>  1, 'd' => '?' },
  { 'a' => 1, 'b' => 1, 'c' =>  2, 'd' => '?' },
  { 'a' => 1, 'b' => 1, 'c' =>  3, 'd' => '?' },
  { 'a' => 1, 'b' => 2, 'c' =>  4, 'd' => '?' },
  { 'a' => 1, 'b' => 2, 'c' =>  5, 'd' => '?' },
  { 'a' => 2, 'b' => 1, 'c' =>  6, 'd' => '?' },
  { 'a' => 2, 'b' => 1, 'c' =>  7, 'd' => '?' },
  { 'a' => 2, 'b' => 1, 'c' =>  8, 'd' => '?' },
  { 'a' => 2, 'b' => 2, 'c' =>  9, 'd' => '?' },
  { 'a' => 2, 'b' => 2, 'c' => 10, 'd' => '?' } ]

Want array of the max value of 'c' grouped by each unique combination of 'a' and 'b'.

[ { 'a' => 1, 'b' => 1, 'c' =>  3, 'd' => '?' },
  { 'a' => 1, 'b' => 2, 'c' =>  5, 'd' => '?' },
  { 'a' => 2, 'b' => 1, 'c' =>  8, 'd' => '?' },
  { 'a' => 2, 'b' => 2, 'c' => 10, 'd' => '?' } ]

The other keys need to be retained but are not otherwise related to the transformation. The best I could figure out so far is to reverse the array (thus descending ordered by 'c'), uniq by 'a' an 'b', and reverse array again. But I am depending on the implementation of uniq_by always returning the first unique item found. The specification doesn't say that, so I am worried about relying on that behavior since it could change in future versions. Also wondering if this may be a really inefficient method.

@data.reverse!.uniq!{|record| [record['a'],record['b']]}.reverse!

Is there a better and more efficient way to do this? If you do have a better way, can you also please explain it instead of just giving me a super nasty one-liner that I may not be able to decipher.

vlasits
  • 2,215
  • 1
  • 15
  • 27
Douglas Mauch
  • 859
  • 2
  • 7
  • 18

1 Answers1

12

That's actually fairly easy:

a.group_by { |h| h.values_at("a", "b") }.map { |_, v| v.max_by { |h| h["c"] } } 

Or with nicer formatting:

a.group_by do |h|
  h.values_at("a", "b") 
end.map do |_, v| 
  v.max_by { |h| h["c"] }
end

Explanation: first we use Enumerable#group_by to create a Hash with the combinations of "a" and "b" (extracted with Hash#values_at) as the keys and all hashes with that combination as the values. We then map over this hash, ignore the keys and select the element with the maximum value for "c" from the array with Enumerable#max_by.

Michael Kohl
  • 66,324
  • 14
  • 138
  • 158
  • Could you explain or give some reference to the meaning of `_` in block parameters? – Flexoid May 16 '12 at 15:40
  • 3
    @Flexoid: No special meaning, it's a parameter I don't care about and in a lot of languages it's customary to use an underscore for the name to signify that. – Michael Kohl May 16 '12 at 15:43
  • @steenslag Somehow the `"c"` became a `v` and it took me a second to figure out where I was being stupid ;-) Rereading my textual description helped, because I described it properly... – Michael Kohl May 16 '12 at 15:51
  • +1 for a very idiomatic solution, especially using `values_at` (which I would have failed to do). Though I did prefer @steenslag's `.last` instead of the `_,v` desplat. Or better yet, `a.group_by{...}.values.map{...}` – Phrogz May 16 '12 at 15:55
  • 3
    `_` does have a special meaning or at least [gets special treatment](http://stackoverflow.com/a/9560198/479863) in some cases, the convention of using `_` as the *I don't care* parameter has hard-wired support in the interpreter. That's just nit picking a comment though :) – mu is too short May 16 '12 at 16:23