0

I need to remove some columns from a CSV. Easy. The problem is I have two columns with full text that actually has commas in them as a part of the data. My cols are enclosed with quotes and the cat is counting the commas in the text as columns. How can I do this so the commas enclosed with quotes are ignored?

example:

"first", "last", "dob", "some long sentence, it has commas in it,", "some data", "foo"

i want to print only rows 1-4, 6

BMW
  • 42,880
  • 12
  • 99
  • 116
karnival8
  • 693
  • 1
  • 5
  • 7

2 Answers2

2

You will save yourself a lot of aggravation by writing a short Perl script that uses Parse::CSV http://metacpan.org/pod/Parse::CSV

I am sure there is a Python way of doing this too.

szabgab
  • 6,202
  • 11
  • 50
  • 64
Red Cricket
  • 9,762
  • 21
  • 81
  • 166
1
cat file | sed -e 's|^"||;s|"$||' | awk 'BEGIN {FS="[\"], ?[\"]"}{print $2}'

Example: http://ideone.com/g2gZmx

How it works: Look at line:

"a,b","c,d","e,f"

We know that each row is surrounded by "". So we can split this line by ",":

cat file | awk 'BEGIN {FS="[\"], ?[\"]"}{print $2}'

and rows will be:

"a,b   c,d   e,f"

But we have annoying " in the start and end of line. So we remove it with sed:

cat file | sed -e 's|^"||;s|"$||' | awk 'BEGIN {FS="[\"], ?[\"]"}{print $2}'

And rows will be

a,b   c,d   e,f

Then we can simply take second row by awk '{print $2}.

Read about regexp field splitting in awk: http://www.gnu.org/software/gawk/manual/html_node/Regexp-Field-Splitting.html

Vladimir Lagunov
  • 1,895
  • 15
  • 15
  • Great post! The `cat file` is not necessary. One could do `sed -e'|^"||;s|"$||' file ... ` – Red Cricket Feb 13 '14 at 04:55
  • not really, sometime cvs has format as `1997,Ford,E350,"Super, luxurious truck"`, refer http://en.wikipedia.org/wiki/Comma-separated_values – BMW Feb 13 '14 at 05:47