Ruby - combining/flattening multiple array of hashes on common hash key/value combination

Question

I am working with a large data set with multiple arrays of hashes which all have a key-value pair in common ("date" & the date value) as the first element of the hash.

The array of hashes I need to parse (@data["snapshot"]) is in the following format. Note that @data["snapshot"][0], @data["snapshot"][1], and @data["snapshot"][2] are in identical format with identical dates but their total's differ. In the resulting hash I need to have a key-value pair which identifies where the data came from.

@data["snapshot"][0] is as follows:

[{"date"=>"1455672010", "total"=>"**817**", "I"=>"1", "L"=>"3", "M"=>"62", "H"=>"5", "C"=>"0"},
 {"date"=>"1455595298", "total"=>"**40**", "I"=>"8", "L"=>"5", "M"=>"562", "H"=>"125", "C"=>"0"},
 {"date"=>"1455336016", "total"=>"**555**", "I"=>"10", "L"=>"1", "M"=>"93", "H"=>"121", "C"=>"0"}]

@data["snapshot"][1] is as follows:

[{"date"=>"1455672010", "total"=>"**70**", "I"=>"1", "L"=>"9", "M"=>"56", "H"=>"25", "C"=>"0"},
 {"date"=>"1455595298", "total"=>"**54**", "I"=>"8", "L"=>"2", "M"=>"5", "H"=>"5", "C"=>"0"},
 {"date"=>"1455336016", "total"=>"**25**", "I"=>"0", "L"=>"9", "M"=>"93", "H"=>"12", "C"=>"0"}]

@data["snapshot"][2] is as follows:

[{"date"=>"1455672010", "total"=>"**70**", "I"=>"12", "L"=>"5", "M"=>"5662", "H"=>"125", "C"=>"0"},
 {"date"=>"1455595298", "total"=>"**43212**", "I"=>"56", "L"=>"6", "M"=>"5662", "H"=>"125", "C"=>"0"},
 {"date"=>"1455336016", "total"=>"**55525**", "I"=>"100", "L"=>"19", "M"=>"5593", "H"=>"121", "C"=>"0"}]

My Question Is Ultimately:

How do I convert (flatten?) the 3 existing array of hashes (@data["snapshot"][0], @data["snapshot"][1], and @data["snapshot"][2]) into a single array of hashes in the following format?

[{"date"=>"1455672010", "CameFromDataSource0"=>"817", "CameFromDataSource1"=>"70", "CameFromDataSource2"=>"70"},
 {"date"=>"1455595298", "CameFromDataSource0"=>"40", "CameFromDataSource1"=>"54", "CameFromDataSource2"=>"43212"},   
 {"date"=>"1455336016", "CameFromDataSource0"=>"555", "CameFromDataSource1"=>"25", "CameFromDataSource2"=>"55525"}]

Hi @sawa, I am hoping to convert the @data['snapshot'] array of hashes into a different hash. See my question for both the current format and the desired format. I'll edit the question a bit now. — Kurt W, Feb 17 '16 at 03:04
After posting an answer I see that you have changed the question. My answer now makes no sense. That's the reason for the rule that questions are not to be changed. I suggest you roll back to your original question. — Cary Swoveland, Feb 17 '16 at 04:07
Hi Cary, apologies for the change but I made those changes just a few minutes after sawa told me I should consider rephrasing it. Since I didn't change the gist of the question but just some of the hash key names, is the result still the same? I will note that questions shouldn't be modified in the future--apologies, I'm brand new with Stack Overflow. To be clear, because I only changed the hash key name (from "key[0]", "key[1]", "key[2]" to "CameFromSource0", "CameFromSource1","CameFromSource2"), does your answer still address my question? Please let me know and thank you very much! — Kurt W, Feb 17 '16 at 07:40
Yes, that's fine, as readers will understand what happened after reading your comment. — Cary Swoveland, Feb 17 '16 at 08:05

Cary Swoveland · Answer 1 · 2016-02-17T20:48:05.400

This is one way to do it.

Code

def convert(data)
  data.each_with_object({}) { |a,h|
    a.each { |g| h.update(g["date"]=>[g["total"][/\d+/]]) { |_,o,n| o+n } } }.
      map { |date, arr| arr.each_with_index.with_object({"date"=>date}) { |(e,i),h| 
        h["key#{i}"] = e } }
end

Example

convert(data)
  #=> [{"date"=>"1455672010", "key0"=>"817", "key1"=>"70", "key2"=>"70"},
  #    {"date"=>"1455595298", "key0"=>"40", "key1"=>"54", "key2"=>"43212"},
  #    {"date"=>"1455336016", "key0"=>"555", "key1"=>"25", "key2"=>"55525"}]

Two steps

You can see that I've done this in two steps. First construct a hash:

f = data.each_with_object({}) { |a,h| a.each { |g|
  h.update(g["date"]=>[g["total"][/\d+/]]) { |_,o,n| o+n } } }
    #=> {"1455672010"=>["817", "70", "70"],
    #    "1455595298"=>["40", "54", "43212"],
    #    "1455336016"=>["555", "25", "55525"]}

Here I have used the form of Hash#update (aka merge!) that employs a block ({ |_,o,n| o+n }) to determine the values of keys that are present in both hashes being merged.

Then convert the hash to the desired format:

f.map { |date, arr| arr.each_with_index.with_object({"date"=>date}) { |(e,i),h| 
  h["key#{i}"] = e } }
  #=> [{"date"=>"1455672010", "key0"=>"817", "key1"=>"70", "key2"=>"70"},
  #    {"date"=>"1455595298", "key0"=>"40", "key1"=>"54", "key2"=>"43212"},
  #    {"date"=>"1455336016", "key0"=>"555", "key1"=>"25", "key2"=>"55525"}]

Thank you so much for taking the time to work through this with me. I really want to learn how to do this without always having to ask or consult by The Ruby Way book ;-) -- apologies that I modified my question mid-stream but this information is very helpful. I will try it tomorrow! — Kurt W, Feb 17 '16 at 07:54
Thank you very much Cary for taking the time here. I went with Jordan's answer because it is very concise and seems to get things done the most efficiently (although I haven't benchmarked it). This is still a very good answer and I think it will help others who are looking to have a bit more control over the resulting data. Appreciate your time--wish I could pick both as correct answers. — Kurt W, Feb 17 '16 at 23:29

Jordan Running · Accepted Answer · 2016-02-17T20:03:42.260

TL;DR

snapshots.each_with_object(Hash.new {|hsh, date| hsh[date] = { "date" => date } })
  .with_index do |(snapshot, hsh), i|
    snapshot["data"].each {|datum| hsh[datum["date"]]["data#{i}"] = datum["total"] }
  end.values

How it works

I'll break it down so you see how each part works. Here's our data (extraneous keys elided for clarity):

snapshots = [
  { "dataSourceID" => "152970",
    "data" => [ { "date" => "1455672010", "total" => "817" }, 
                { "date" => "1455595298", "total" => "40" },
                { "date" => "1455336016", "total" => "555" } ]
  }
  { "dataSourceID" => "33151",
    "data" => [ { "date" => "1455672010", "total" => "70" }, 
                { "date" => "1455595298", "total" => "54" },
                { "date" => "1455336016", "total" => "25" } ]
  },
  { "dataSourceID" => "52165",
    "data" => [ { "date" => "1455672010", "total" => "70" }, 
                { "date" => "1455595298", "total" => "43212" },
                { "date" => "1455336016", "total" => "55525" } ]
  }
]

Most of the magic is here:

result_hash = Hash.new {|hsh, date| hsh[date] = { "date" => date } }

Here we're using the Hash's default proc to automatically initialize new keys in the following way:

result_hash = Hash.new {|hsh, date| hsh[date] = { "date" => date } }
p result_hash["1455672010"]
# => { "date" => "1455672010" }

p result_hash
# => { "1455672010" => { "date" => "1455672010" } }

Simply accessing result_hash[foo] creates the hash { "date" => foo } and assigns it to result_hash[foo]. This enables the following:

result_hash["1455672010"]["data0"] = "817"
p result_hash
# => { "1455672010" => { "date" => "1455672010", "data0" => "817" } }

Magic!

Now suppose we have the following data:

data = [ { "date" => "1455672010", "total" => "817" }, 
         { "date" => "1455595298", "total" => "40" },
         { "date" => "1455336016", "total" => "555" } ]

Using our magic result_hash, we can do this:

data.each do |datum|
  result_hash[datum["date"]]["data0"] = datum["total"]
end
p result_hash
# => { "1455672010" => { "date" => "1455672010", "data0" => "817" },
#      "1455595298" => { "date" => "1455595298", "data0" => "40" },
#      "1455336016" => { "date" => "1455336016", "data0" => "555" } }

See where I'm going with this? Here's all of our data, finally:

snapshots = [
  { "dataSourceID" => "152970",
    "data" => [ { "date" => "1455672010", "total" => "817" }, 
                { "date" => "1455595298", "total" => "40" },
                { "date" => "1455336016", "total" => "555" } ]
  }
  { "dataSourceID" => "33151",
    "data" => [ { "date" => "1455672010", "total" => "70" }, 
                { "date" => "1455595298", "total" => "54" },
                { "date" => "1455336016", "total" => "25" } ]
  },
  { "dataSourceID" => "52165",
    "data" => [ { "date" => "1455672010", "total" => "70" }, 
                { "date" => "1455595298", "total" => "43212" },
                { "date" => "1455336016", "total" => "55525" } ]
  }
]

Instead of hard-coding "data0", we can iterate over the snapshots hashes using each_with_index and build that key ("data0", then "data1", and so on) for each iteration. Inside that loop we can do exactly what we did above but with the "data" array from each snapshots hash:

result_hash = Hash.new {|hsh, date| hsh[date] = { "date" => date } }

snapshots.each_with_index do |snapshot, i|
  data_key = "data#{i}"

  snapshot["data"].each do |datum|
    date = datum["date"]
    result_hash[date][data_key] = datum["total"]
  end
end

p result_hash.values
# => [ { "date" => "1455672010", "data0" => "817", "data1" => "70", "data2" => "70" },
#      { "date" => "1455595298", "data0" => "40",  "data1" => "54", "data2" => "43212" },
#      { "date" => "1455336016", "data0" => "555", "data1" => "25", "data2" => "55525" } ]

Of course, this can be condensed some, which I've done in TL;DR above.

Hi again Jordan! Thank you for your continued help on my Ruby-related initiatives. This is very concise and I really appreciate your breaking out each step to explain how it relates to the whole picture. I think I understand what this is doing and appreciate the cleanliness & brevity of it. I will try your suggestion tomorrow and let you know the result. Again, really appreciate this! group_by(&:shift) is really neat. — Kurt W, Feb 17 '16 at 07:57
Hi Jordan, I tried out the code you provided and although it didn't throw any errors, I got back an empty hash with "date", "data0", "data1", and "data2" blank. I then tried to look at each step to make sure data was being populated but even trying to puts result1 is empty. Here is the exact representation of my data -- I hope I explained it properly in my question. http://pasted.co/b2c20b23 -- thank you in advance for your help!! — Kurt W, Feb 17 '16 at 18:16
@KurtW Thanks for the clarification. I've edited my answer, which now uses a significantly different approach, but one which I think is much simpler. Re: your Chef question I'm not sure what you mean. Chef cookbooks and recipes are all written in Ruby, and much of our software is Ruby as well. I'm glad to hear you're a customer! — Jordan Running, Feb 17 '16 at 20:07
We are just beginning to get our hands on it--glad to hear it's *mostly* Ruby -- the better I get at all of this, the more equipped I'll be when we bring it in. Thanks again for your help and going back to write out such a detailed response. Much appreciated! — Kurt W, Feb 17 '16 at 23:09
Hey, since I have you here, do you know of a way to round UNIX timestamps in Ruby to the nearest month (or week)? The results that came back from your code is fantastic, but a bit too granular and I can't control what comes out of my tool at that level. Wondering if there is a way to convert 1454817699 (02/07/2016) down to 02/01/2016 (always the first of the month). If so, I will loop through all of the dates before running your code so the list is much smaller. I won't change the question and no need to change the answer, just curious if you knew a way :-). Thanks either way !! — Kurt W, Feb 17 '16 at 23:25
That's beyond the scope of a comment. I suggest posting a new question, or checking out Ruby's Time and DateTime classes. — Jordan Running, Feb 18 '16 at 04:46
I'll try to figure this one out without help. At some point, I gotta sink or swim and that seems like one I might be able to figure out on my own. I'll give it a shot tomorrow. Thanks! — Kurt W, Feb 18 '16 at 08:11

Ruby - combining/flattening multiple array of hashes on common hash key/value combination

2 Answers2

TL;DR

How it works