4

My Rails3 app parses user-uploaded CSV files.
As can be expected, users upload tab-separated AND comma-separated files.
I want to support both.

My code:

input = CSV.read(uploaded_io.tempfile, { encoding: "UTF-8", :col_sep => "\t"})

QUESTION:How to change it to support commas too?

FasterCSV's doc describes col_sep as The String placed between each field. so :col_sep => ",\t" won't work.

Note: All data inside are integers or identifiers, so the probability of someone using \t or , within the content (not a delimiter) is zero. So usage of the two different delimiters in the same file is not something I expressly want to prevent.

Nicolas Raoul
  • 58,567
  • 58
  • 222
  • 373

3 Answers3

4

Solution 1:

One simple way to do it is to let the user select with a drop-down which separator they use in their CSV file, and then you just set that value in the CSV.read() call. But I guess you want it automatic. :-)

Solution 2:

You can read-in the first line of the CSV file with regular File.read() and analyze it by matching the first line against /,/ and then against /\t/ ... depending on which RegExp matches, you select the separator in the CSV.read() call to the according (single) separator. Then you read in the file with CSV.read(..., :col_sep => single_separator ) accordingly.

But Beware:

At first it looks nice and elegant to want to use ",\t" as the separator in the method call to allow both -- but please note this would introduce a possible nasty bug!

If a CVS file would contain both tabs and commas by accident or by chance ... what do you do then? Separate on both? How can you be sure? I think that would be a mistake, because CSV separators don't appear "mixed" like this in regular CSV files -- it's always either ',' or "\t"

So I think you should not use ",\t" -- that could be causing huge problems, and that's probably the reason why they did not implement / allow the col_sep option to accept a RegExp.

Tilo
  • 33,354
  • 5
  • 79
  • 106
  • See the note I added: the problems you mention are understandable but do not apply here. – Nicolas Raoul Oct 20 '11 at 07:27
  • OK, understood.. that means that Solution 2 should work nicely.. correct? :) – Tilo Oct 20 '11 at 07:31
  • +1 Yes, that's a valid solution, thanks! Before accepting I will wait a day or so, maybe someone knows a forgotten FasterCSV option that does this even more elegantly :-) – Nicolas Raoul Oct 20 '11 at 07:38
  • thank you! Looking at the FasterCSV API #new call, it doesn't look like it :) http://fastercsv.rubyforge.org/classes/FasterCSV.html#M000021 – Tilo Oct 20 '11 at 07:42
0

If the data does not contain escaping quotes and such, just splitting on a regex would do it.

f = File.new("some_file.csv")
res = f.readlines.map{|line| line.chomp.split(/[\t,]/)}
f.close
steenslag
  • 79,051
  • 16
  • 138
  • 171
0

Brutal solution:

require 'csv'
csv= CSV.new("some_file")
csv.instance_variable_set(:@col_sep, /[\t,]/)
steenslag
  • 79,051
  • 16
  • 138
  • 171