3

Is there any way to tell the CSV object that a line break between quotes is not a row delimiter?

My CSV file is:

"a","b","c"
1,"some
text with line break",21
2,"blah",4

My code is:

CSV.foreach(file_path, headers: true) do |row|
  puts row
end

I want it to return only two rows, but it returns three.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
K. Drab
  • 193
  • 1
  • 11
  • 1
    If line break is not the row delimiter, then what is? – tolgap Oct 04 '15 at 09:09
  • Line break is row delimiter but when new line occurres between quotes eg. "asd \n asdf" it shouldn't be treated as two rows. – K. Drab Oct 04 '15 at 09:16
  • it returns exact three rows - and it is the way it should work. but could return four if your comment wasn't true. in other way all of your rows would suggested as one row... isn't it? – okliv Oct 04 '15 at 09:23
  • you can pre-process each text parts inside quotes with regexp and delete any internal "\n" – okliv Oct 04 '15 at 09:28
  • Excluding header it returns three rows: [["1","some"],["text with line break", "21"],["2","blah","3"]]. I want it to returns two rows: [["1", "some\n text with line break","21"], ["2","blah","4"] – K. Drab Oct 04 '15 at 09:40

3 Answers3

5

You're (wrongly) judging the number of rows by the number of printed lines. It returns two. Go figure:

[4] pry(main)> CSV.foreach('example.csv', headers: true).to_a
=> [
 #<CSV::Row "a":"1" "b":"some\ntext with line break" "c":"21">,
 #<CSV::Row "a":"2" "b":"blah" "c":"4">
]

Your code outputs three lines because you're printing the rows out and line break is printed as-is. That makes it look as if one row became two. Thinking the same way, I'd say that your source CSV contains 4 (four!) rows. And that isn't really true.

D-side
  • 9,150
  • 3
  • 28
  • 44
1

Currently, you can set your header into true then to show your data row.to_hash. Example:

CSV.foreach("/home/akbar/text.csv", headers: true) do |row|
  puts row.to_hash
end

The result is:

1.9.3p194 :034 > CSV.foreach("/home/akbar/text.csv", headers: true) do |x|
1.9.3p194 :035 >     puts x.to_hash
1.9.3p194 :036?>   end
{"a"=>"1", "b"=>"some\ntext with line break", "c"=>"21"}
{"a"=>"2", "b"=>"blah", "c"=>"4"}

For more information see "ruby-on-rails-import-data-from-a-csv-file".

Community
  • 1
  • 1
akbarbin
  • 4,985
  • 1
  • 28
  • 31
  • 1
    You need to provide backing proof that this will solve the question the OP is asking. Instead of throwing a piece of code, explain why it's the right answer. – the Tin Man Oct 04 '15 at 11:41
  • @theTinMan, thanks for your suggestion. I have to do more clearly to answer. – akbarbin Oct 04 '15 at 14:24
  • Thank you. That helps. While we can answer with code only, that's discouraged. Code with no explanation solves the initial question but does not teach so that the situation can be avoided in the future. It's akin to giving the user a fish versus teaching them how to fish. Also, it isn't necessary to add "update" or "updated" when making a change. We can see when and what changed by looking at the revision history. – the Tin Man Oct 04 '15 at 16:14
1

For those who getting trouble when trying to read a CSV file that contains a line break in any row, just read it with row_sep: '\r\n'

data = CSV.read('your_file.csv', row_sep: "\r\n")
Hank Phung
  • 2,059
  • 1
  • 23
  • 38
  • 1
    Thank you so much. It exactly fixes my issue. In my csv file, header, or field value could contain line break like `"abc\ndef "`, but all line break for each row is "\r\n". `row_sep: "\r\n"` solves the issue perfectly – new2cpp Dec 04 '20 at 08:19