0

I want to change CSV file content:

itemId,url,name,type
1|urlA|nameA|typeA
2|urlB|nameB|typeB
3|urlC,urlD|nameC|typeC
4|urlE|nameE|typeE

into an array:

[itemId,url,name,type]
[1,urlA,nameA,typeA]
[2,urlB,nameB,typeB]
[**3**,**urlC**,nameC,typeC]
[**3**,**urlD**,nameC,typeC]
[4,urlE,nameE,typeE]

Could anybody teach me how to do it? Finally, I'm going to DL url files(.jpg)

sawa
  • 165,429
  • 45
  • 277
  • 381
shnog
  • 13
  • 5
  • Can you show your attempt? – Jagdeep Singh May 30 '18 at 07:21
  • To solve this particular example, you can iterate over each row of input CSV and in each iteration, add as many new rows in new (output) CSV as many values does second column has. – Jagdeep Singh May 30 '18 at 07:22
  • 1
    Better you fix your col separators, you have both `|` and `,`. Then you can [CSV class](https://ruby-doc.org/stdlib-2.5.0/libdoc/csv/rdoc/CSV.html) – iGian May 30 '18 at 07:30
  • That does not look like CSV. Nor is your array actually an array. – sawa May 30 '18 at 07:57

2 Answers2

1

The header row has a different separator than the data. That's a problem. You need to change the header row to use | instead of ,. Then:

require 'csv'
require 'pp'

array = Array.new
CSV.foreach("test.csv", col_sep: '|', headers: true) do |row|
  if row['url'][/,/]
    row['url'].split(',').each do |url|
      row['url'] = url
      array.push row.to_h.values
    end
  else
    array.push row.to_h.values
  end
end

pp array

=> [["1", "urlA", "nameA", "typeA"],
    ["2", "urlB", "nameB", "typeB"],
    ["3", "urlC", "nameC", "typeC"],
    ["3", "urlD", "nameC", "typeC"],
    ["4", "urlE", "nameE", "typeE"]]
Tom
  • 412
  • 2
  • 9
  • I just wanted to "|" for express csv contents. I knew how to write csv table by ",". Thanks. And, I can't understand line 8th yet. row['url'] = url how should I undertand this mean? – shnog Jun 01 '18 at 11:41
  • You *have* to use a different delimiter between fields, and the values of a field which is itself a list. You can't do `3,urlC,urlD,nameC,typeC` and `4,urlE,nameE,typeE` in the same file -- that is not machine parsable. – Tom Jun 04 '18 at 21:05
  • As far as the syntax `row['url'] = url`: at the point that this executes, `row` is an object of type `CSV::Row`, which has a `[]=` method. So you are calling `CSV::Row#[]=` and passing it the value of `url`. See the ruby core docs on CSV::Row: http://ruby-doc.org/stdlib-2.5.1/libdoc/csv/rdoc/CSV/Row.html#method-i-5B-5D. – Tom Jun 04 '18 at 21:12
  • I did my DL tool what I needed thank of your advice. Really thank you so much sir. – shnog Jun 07 '18 at 07:41
0

You'll need to test the fifth column to see how the line should be parsed. If you see a fifth element (row[4]) output the line twice replacing the url column

array = Array.new
CSV.foreach("test.csv") do |row|
  if row[4]
    array << [row[0..1], row[3..4]].flatten
    array << [[row[0]], row[2..4]].flatten
  else
    array << row
  end
end
p array

In your example you had asterisks but I'm assuming that was just to emphasise the lines for which you want special handling. If you do want asterisks, you can modify the two array shovel commands appropriately.

SteveTurczyn
  • 36,057
  • 6
  • 41
  • 53