Ruby - Merge CSV duplicate columns with same SKU

Question

I have created a CSV file about my eshop that contains multiple items with different SKUs. Some SKUs appear more than once because they can be in more than one category (but the Title and Price will always be the same for a given SKU). Example:

SKU,Title,Category,Price    
001,Soap,Bathroom,0.5    
001,Soap,Kitchen,0.5
002,Water,Kitchen,0.4
002,Water,Garage,0.4
003,Juice,Kitchen,0.8

I now wish to create from that file another CSV file that has no duplicate SKU's and aggregates the "Category" attributes as follows:

SKU,Title,Category,Price
001,Soap,Bathroom/Kitchen,0.5    
002,Water,Kitchen/Garage,0.4
003,Juice,Kitchen,0.8

How can I do that?

what have you tried so far? Do you have any code you can add to your question? see https://stackoverflow.com/help/how-to-ask — lacostenycoder, Aug 05 '19 at 20:47
i tried code from this question but i couldnt make it work for me, https://stackoverflow.com/questions/10973182/merge-rows-csv-by-id-ruby — Antonis Iliadis, Aug 05 '19 at 20:50
You example before the edits contained several instances of commas preceded and/or followed by one or more spaces. There should be none. If, for example, the first line were, `"SKU , Title,Category,Price"` the first two fields would be read as `"SKU "` and `" Title"`. The alternative is to remove the spaces after the line is parsed, which should not be required. — Cary Swoveland, Aug 06 '19 at 00:51

Cary Swoveland · Answer 1 · 2019-08-07T03:33:04.800

It's my understand you wish to read a CSV file, perform some operations on the data and then write the result to a new CSV file. You could do that as follows.

Code

require 'csv'

def convert(csv_file_in, csv_file_out, group_field, aggregate_field)
  csv = CSV.read(FNameIn, headers: true)
  headers = csv.headers
  arr = csv.group_by { |row| row[group_field] }.
            map do |_,a|
              headers.map { |h| h==aggregate_field ?
                (a.map { |row| row[aggregate_field] }.join('/')) : a.first[h] }
            end
  CSV.open(FNameOut, "wb") do |csv|
    csv << headers
    arr.each { |row| csv << row }
  end
end

Example

Let's create a CSV file with the following data:

s =<<_
SKU,Title,Category,Price
001,Soap,Bathroom,0.5
001,Soap,Kitchen,0.5
002,Water,Kitchen,0.4
002,Water,Garage,0.4
003,Juice,Kitchen,0.8
_

FNameIn  = 'testin.csv'
FNameOut = 'testout.csv'

IO.write(FNameIn, s)
  #=> 135

Now execute the method with these values:

convert(FNameIn, FNameOut, "SKU", "Category")

and confirm FNameOut was written correctly:

puts IO.read(FNameOut)
SKU,Title,Category,Price
001,Soap,Bathroom/Kitchen,0.5
002,Water,Kitchen/Garage,0.4
003,Juice,Kitchen,0.8

Explanation

The steps are as follows:

csv_file_in = FNameIn
csv_file_out = FNameOut
group_field = "SKU"
aggregate_field = "Category"
csv = CSV.read(FNameIn, headers: true)

See CSV::read.

headers = csv.headers
  #=> ["SKU", "Title", "Category", "Price"] 
h = csv.group_by { |row| row[group_field] }
  #=> {"001"=>[
         #<CSV::Row "SKU":"001" "Title":"Soap" "Category":"Bathroom" "Price":"0.5">,
  #      #<CSV::Row "SKU":"001" "Title":"Soap" "Category":"Kitchen" "Price":"0.5">
  #    ],
  #    "002"=>[
  #      #<CSV::Row "SKU":"002" "Title":"Water" "Category":"Kitchen" "Price":"0.4">,
  #      #<CSV::Row "SKU":"002" "Title":"Water" "Category":"Garage" "Price":"0.4">
  #    ],
  #    "003"=>[
  #      #<CSV::Row "SKU":"003" "Title":"Juice" "Category":"Kitchen" "Price":"0.8">
  #    ]
  #   } 
arr = h.map do |_,a|
        headers.map { |h| h==aggregate_field ?
          (a.map { |row| row[aggregate_field] }.join('/')) : a.first[h] }
      end
   #=> [["001", "Soap", "Bathroom/Kitchen", "0.5"],
   #    ["002", "Water", "Kitchen/Garage", "0.4"],
   #    ["003", "Juice", "Kitchen", "0.8"]]

See CSV#headers and Enumerable#group_by, an oft-used method. Lastly, write the output file:

CSV.open(FNameOut, "wb") do |csv|
  csv << headers
  arr.each { |row| csv << row }
end

See CSV::open. Now let's return to the calculation of arr. This is most easily explained by inserting some puts statements and executing the code.

arr = h.map do |_,a|
          puts "  _=#{_}"
          puts "  a=#{a}"
          headers.map do |h|
            puts "    header=#{h}"
            if h==aggregate_field
              a.map { |row| row[aggregate_field] }.join('/')
            else
              a.first[h]
            end.
            tap { |s| puts "    mapped to #{s}" }
          end
        end

See Object#tap. The following is displayed.

  _=001
  a=[#<CSV::Row "SKU":"001" "Title":"Soap" "Category":"Bathroom" "Price":"0.5">,
     #<CSV::Row "SKU":"001" "Title":"Soap" "Category":"Kitchen" "Price":"0.5">]
    header=SKU
    mapped to 001
    header=Title
    mapped to Soap
    header=Category
    mapped to Bathroom/Kitchen
    header=Price
    mapped to 0.5

  _=002
  a=[#<CSV::Row "SKU":"002" "Title":"Water" "Category":"Kitchen" "Price":"0.4">,
     #<CSV::Row "SKU":"002" "Title":"Water" "Category":"Garage" "Price":"0.4">]
    header=SKU
    mapped to 002
    header=Title
    mapped to Water
    header=Category
    mapped to Kitchen/Garage
    header=Price
    mapped to 0.4

  _=003
  a=[#<CSV::Row "SKU":"003" "Title":"Juice" "Category":"Kitchen" "Price":"0.8">]
    header=SKU
    mapped to 003
    header=Title
    mapped to Juice
    header=Category
    mapped to Kitchen
    header=Price
    mapped to 0.8

score 0 · Answer 2 · answered Aug 05 '19 at 22:26

It seems that in order for this to be correct, we must assume the SKU number and the price are always the same. Since you know the only key you want to merge data between is Category here is how you can do it.

Assuming this is your test.csv in the same path as the ruby script:

# test.csv
SKU,Title,Category,Price
001,Soap,Bathroom,0.5
001,Soap,Kitchen,0.5
002,Water,Kitchen,0.4
002,Water,Garage,0.4
003,Juice,Kitchen,0.8

Ruby script in same directory as your test.csv file

# fix_csv.rb
require 'csv'
rows = CSV.read 'test.csv', :headers => true
skews = rows.group_by{|row| row['SKU']}.keys.uniq
values = rows.group_by{|row| row['SKU']}

merged = skews.map do |key|
  group = values.select{|k,v| k == key}.values.flatten.map(&:to_h)
  category = group.map{|k,v| k['Category']}.join('/')
  new_data = group[0]
  new_data['Category'] = category
  new_data
end

CSV.open('merged_data.csv', 'w') do |csv|
  csv << merged.first.keys # writes the header row
  merged.each do |hash|
    csv << hash.values
  end
end

puts 'see contents of merged_data.csv'

Ruby - Merge CSV duplicate columns with same SKU

2 Answers2