0

Given a comma separated CSV file in the following format:

Day,User,Requests,Page Views,Browse Time,Total Bytes,Bytes Received,Bytes Sent
"Jul 25, 2012","abc123",3,0,0,13855,3287,10568
"Jul 25, 2012","abc230",1,0,0,1192,331,861
"Jul 25, 2012",,7,0,0,10990,2288,8702
"Jul 24, 2012","123456",3,0,0,3530,770,2760
"Jul 24, 2012","abc123",19,1,30,85879,67791,18088

I wanted to drop the entire dataset (1000 users over 30 days = 30,000 records) into a hash such that Key 1 may be a duplicate key, key 2 may be a duplicate key, but Key 1 & 2 will be unique together.

Example using line 1 above:

report_hash = "Jul 25, 2012" => "abc123" => {"PageRequest" => 3, "PageViews" => 0, "BrowseTime" => 0, "TotalBytes" => 13855, "BytesReceived" => 3287, "BytesSent" => 10568}

def hashing(file)
  #read the CSV file into an Array
  report_arr = CSV.read(file)
  #drop the header row
  report_arr.drop(1)
  #Create an empty hash to save the data to
  report_hash = {}
  #for each row in the array,
  #if the first element in the array is not a key in the hash, make one
  report_arr.each{|row|
    if report_hash[row[0]].nil?
      report_hash[row[0]] = Hash.new
    #If the key exists, does the 2nd key exist?  if not, make one
    elsif report_hash[row[0]][row[1]].nil?
      report_hash[row[0]][row[1]] = Hash.new
    end
    #throw all the other data into the 2-key hash
    report_hash[row[0]][row[1]] = {"PageRequest" => row[2].to_i, "PageViews" => row[3].to_i, "BrowseTime" => row[4].to_i, "TotalBytes" => row[5].to_i, "BytesReceived" => row[6].to_i, "BytesSent" => row[7].to_i}
  }
  return report_hash
end

I spent several hours learning hashes and associated content to get this far, but feel like there is a much more efficient method to do this. Any suggestions on the proper/more efficient way of creating a nested hash with the first two keys being the first two elements of the array such that they create a "composite" unique key?

arserbin3
  • 6,010
  • 8
  • 36
  • 52
Neobane
  • 3
  • 1
  • 6

1 Answers1

2

You could use the array [day, user] as the hash key.

report_hash = {
  ["Jul 25, 2012","abc123"] =>
    {
      "PageRequest" => 3,
      "PageViews" => 0,
      "BrowseTime" => 0,
      "TotalBytes" => 13855,
      "BytesReceived" => 3287,
      "BytesSent" => 10568
    }
}

You just have to make sure the date and user always appear the same. If your date (for example) appears in a different format sometimes, you'll have to normalize it before using it to read or write the hash.

A similar way would be to convert the day + user into a string, using some delimiter between them. But you have to be more careful that the delimiter doesn't appear in the day or the user.

EDIT:

Also make sure you don't modify the hash keys. Using arrays as keys makes this a very easy mistake to make. If you really wanted to, you could modify a copy using dup, like this:

new_key = report_hash.keys.first.dup
new_key[1] = 'another_user'
Kelvin
  • 20,119
  • 3
  • 60
  • 68