2

I have a CSV File that looks like this

"url","id","role","url","deadline","availability","location","my_type","keywords","source","external_id","area","area (1)"
"https://myurl.com","123456","This a string","https://myurl.com?source=5&param=1","31-01-2020","1","Location´s Place","another_string, my_string","key1, key2, key3","anotherString","145129","Place in Earth",""

It has 13 columns.

The issue is that I get each row with a \" and I don't want that. Also, I get 16 columns back in the read.

This is what I have done

csv = CSV.new(File.open('myfile.csv'), quote_char:"\x00", force_quotes:false)
csv.read[1]

Output:

["\"https://myurl.com\"", "\"123456\"", "\"This a string\"", "\"https://myurl.com?source=5&param=1\"", "\"31-01-2020\"", "\"1\"", "\"Location´s Place\"", "\"another_string", " my_string\"", "\"key1", " key2", " key3\"", "\"anotherString\"", "\"145129\"", "\"Place in Earth\"", "\"\""]
David
  • 97
  • 9
  • Not sure if it's the best approach, but can't you replace it? – Alexander Santos Dec 06 '19 at 14:13
  • @AlexanderSantos I thought about that. However, I am afraid of removing them from a string that they are part of. – David Dec 06 '19 at 14:35
  • What about only replacing the starting \" and ending \"? By this way, you can guarantee that you wouldn't replace any \" from the word. Unless there are some words that should have those quotes in the beginning/ending parts. – Alexander Santos Dec 08 '19 at 14:57

1 Answers1

4

The file you showed is a standard CSV file. There is nothing special needed. Just delete all those unnecessary arguments:

csv = CSV.new(File.open('myfile.csv'))
csv.read[1]
#=> [
#      "https://myurl.com", 
#      "123456", 
#      "This a string", 
#      "https://myurl.com?source=5&param=1", 
#      "31-01-2020", 
#      "1", 
#      "Location´s Place", 
#      "another_string, my_string", 
#      "key1, key2, key3", 
#      "anotherString", 
#      "145129", 
#      "Place in Earth", 
#      ""
#   ]
  • force_quotes doesn't do anything in your code, because it controls whether or not the CSV library will quote all fields when writing CSV. You are reading, not writing, so this argument is useless.
  • quote_char: "\x00" is clearly wrong, since the quote character in the example you posted is clearly " not NUL.
  • quote_char: '"' would be correct, but is not necessary, since it is the default.
Jörg W Mittag
  • 363,080
  • 75
  • 446
  • 653
  • Hey thanks for your quick reply. I get this: CSV::MalformedCSVError (Illegal quoting in line 1.) – David Dec 06 '19 at 14:30
  • I see no illegal quoting in the file you posted, and I tested my code with the file you posted. I cannot reproduce this. – Jörg W Mittag Dec 06 '19 at 16:12
  • @DavidCabeza maybe your actual file contains an UTF-8 BOM. – Stefan Dec 06 '19 at 16:47
  • It might be worth adding the `headers: true` argument for this scenario. – 3limin4t0r Dec 06 '19 at 17:35
  • @Stefan Yes, it turns out that my CSV file contains a UTF-8 bom. I am trying to read the file using the foreach method as [this page](https://mattboldt.com/importing-massive-data-into-rails/) suggests but when I try to pass the encoding parameter: 'r: bom | utf-8' returns /home/david/.rbenv/versions/2.4.2/lib/ruby/2.4.0/csv.rb:1282: warning: Unsupported encoding r ignored /home/david/.rbenv/versions/2.4.2/lib/ruby/2.4.0/csv.rb:1282: warning: Unsupported encoding bom|utf-8 ignored – David Dec 11 '19 at 10:05
  • I have fixed the encoding with [this](https://stackoverflow.com/questions/22571400/illegal-quoting-in-line-1-csvmalformedcsverror/26181812). Now I am trying to find out how to have headers and encoding at the same time. The options={} hash doesn't work.. – David Dec 11 '19 at 11:29