LINUX: Using cat to remove columns in CSV - some have commas in the data

Question

I need to remove some columns from a CSV. Easy. The problem is I have two columns with full text that actually has commas in them as a part of the data. My cols are enclosed with quotes and the cat is counting the commas in the text as columns. How can I do this so the commas enclosed with quotes are ignored?

example:

"first", "last", "dob", "some long sentence, it has commas in it,", "some data", "foo"

i want to print only rows 1-4, 6

Just exactly how are you doing this with `cat`? AFAIK `cat` has no editing capabilities. This looks like a job for `sed`. — , Feb 13 '14 at 04:40
@mikew As much as I love sed, this is a job for a csv parser. — Kevin, Feb 13 '14 at 04:54
@kevin Each to their own. Either way, it can't be done with `cat` alone. — , Feb 13 '14 at 04:59

score 2 · Answer 1 · edited Apr 06 '14 at 20:20

2

You will save yourself a lot of aggravation by writing a short Perl script that uses Parse::CSV http://metacpan.org/pod/Parse::CSV

I am sure there is a Python way of doing this too.

edited Apr 06 '14 at 20:20

szabgab

6,202
11
50
64

answered Feb 13 '14 at 04:41

Red Cricket

9,762
21
81
166

You can also use Text::CSV and Text::CSV_XS. – Andy Lester Feb 13 '14 at 04:53

score 1 · Answer 2 · answered Feb 13 '14 at 04:49

cat file | sed -e 's|^"||;s|"$||' | awk 'BEGIN {FS="[\"], ?[\"]"}{print $2}'

Example: http://ideone.com/g2gZmx

How it works: Look at line:

"a,b","c,d","e,f"

We know that each row is surrounded by "". So we can split this line by ",":

cat file | awk 'BEGIN {FS="[\"], ?[\"]"}{print $2}'

and rows will be:

"a,b   c,d   e,f"

But we have annoying " in the start and end of line. So we remove it with sed:

cat file | sed -e 's|^"||;s|"$||' | awk 'BEGIN {FS="[\"], ?[\"]"}{print $2}'

And rows will be

a,b   c,d   e,f

Then we can simply take second row by awk '{print $2}.

Read about regexp field splitting in awk: http://www.gnu.org/software/gawk/manual/html_node/Regexp-Field-Splitting.html

Great post! The `cat file` is not necessary. One could do `sed -e'|^"||;s|"$||' file ... ` — Red Cricket, Feb 13 '14 at 04:55
not really, sometime cvs has format as `1997,Ford,E350,"Super, luxurious truck"`, refer http://en.wikipedia.org/wiki/Comma-separated_values — BMW, Feb 13 '14 at 05:47

LINUX: Using cat to remove columns in CSV - some have commas in the data

2 Answers2