It's my understand you wish to read a CSV file, perform some operations on the data and then write the result to a new CSV file. You could do that as follows.
Code
require 'csv'
def convert(csv_file_in, csv_file_out, group_field, aggregate_field)
csv = CSV.read(FNameIn, headers: true)
headers = csv.headers
arr = csv.group_by { |row| row[group_field] }.
map do |_,a|
headers.map { |h| h==aggregate_field ?
(a.map { |row| row[aggregate_field] }.join('/')) : a.first[h] }
end
CSV.open(FNameOut, "wb") do |csv|
csv << headers
arr.each { |row| csv << row }
end
end
Example
Let's create a CSV file with the following data:
s =<<_
SKU,Title,Category,Price
001,Soap,Bathroom,0.5
001,Soap,Kitchen,0.5
002,Water,Kitchen,0.4
002,Water,Garage,0.4
003,Juice,Kitchen,0.8
_
FNameIn = 'testin.csv'
FNameOut = 'testout.csv'
IO.write(FNameIn, s)
#=> 135
Now execute the method with these values:
convert(FNameIn, FNameOut, "SKU", "Category")
and confirm FNameOut
was written correctly:
puts IO.read(FNameOut)
SKU,Title,Category,Price
001,Soap,Bathroom/Kitchen,0.5
002,Water,Kitchen/Garage,0.4
003,Juice,Kitchen,0.8
Explanation
The steps are as follows:
csv_file_in = FNameIn
csv_file_out = FNameOut
group_field = "SKU"
aggregate_field = "Category"
csv = CSV.read(FNameIn, headers: true)
See CSV::read.
headers = csv.headers
#=> ["SKU", "Title", "Category", "Price"]
h = csv.group_by { |row| row[group_field] }
#=> {"001"=>[
#<CSV::Row "SKU":"001" "Title":"Soap" "Category":"Bathroom" "Price":"0.5">,
# #<CSV::Row "SKU":"001" "Title":"Soap" "Category":"Kitchen" "Price":"0.5">
# ],
# "002"=>[
# #<CSV::Row "SKU":"002" "Title":"Water" "Category":"Kitchen" "Price":"0.4">,
# #<CSV::Row "SKU":"002" "Title":"Water" "Category":"Garage" "Price":"0.4">
# ],
# "003"=>[
# #<CSV::Row "SKU":"003" "Title":"Juice" "Category":"Kitchen" "Price":"0.8">
# ]
# }
arr = h.map do |_,a|
headers.map { |h| h==aggregate_field ?
(a.map { |row| row[aggregate_field] }.join('/')) : a.first[h] }
end
#=> [["001", "Soap", "Bathroom/Kitchen", "0.5"],
# ["002", "Water", "Kitchen/Garage", "0.4"],
# ["003", "Juice", "Kitchen", "0.8"]]
See CSV#headers and Enumerable#group_by, an oft-used method. Lastly, write the output file:
CSV.open(FNameOut, "wb") do |csv|
csv << headers
arr.each { |row| csv << row }
end
See CSV::open. Now let's return to the calculation of arr
. This is most easily explained by inserting some puts
statements and executing the code.
arr = h.map do |_,a|
puts " _=#{_}"
puts " a=#{a}"
headers.map do |h|
puts " header=#{h}"
if h==aggregate_field
a.map { |row| row[aggregate_field] }.join('/')
else
a.first[h]
end.
tap { |s| puts " mapped to #{s}" }
end
end
See Object#tap. The following is displayed.
_=001
a=[#<CSV::Row "SKU":"001" "Title":"Soap" "Category":"Bathroom" "Price":"0.5">,
#<CSV::Row "SKU":"001" "Title":"Soap" "Category":"Kitchen" "Price":"0.5">]
header=SKU
mapped to 001
header=Title
mapped to Soap
header=Category
mapped to Bathroom/Kitchen
header=Price
mapped to 0.5
_=002
a=[#<CSV::Row "SKU":"002" "Title":"Water" "Category":"Kitchen" "Price":"0.4">,
#<CSV::Row "SKU":"002" "Title":"Water" "Category":"Garage" "Price":"0.4">]
header=SKU
mapped to 002
header=Title
mapped to Water
header=Category
mapped to Kitchen/Garage
header=Price
mapped to 0.4
_=003
a=[#<CSV::Row "SKU":"003" "Title":"Juice" "Category":"Kitchen" "Price":"0.8">]
header=SKU
mapped to 003
header=Title
mapped to Juice
header=Category
mapped to Kitchen
header=Price
mapped to 0.8