Bash How to wrap values of the first line of a csv file with quotations, if they do not exist

Question

The other day I asked how to wrap values of the first line of a csv file with quotations. I was given this reply which worked great.

$ cat file.csv  
word1,word2,word3,word4,word5  
12345,12346,12347,12348,12349

To put quotes around the items in the first line only:

$ sed '1 { s/^/"/; s/,/","/g; s/$/"/ }' file.csv  
"word1","word2","word3","word4","word5"  
12345,12346,12347,12348,12349

I now need to test if the quotes exist around the values to eliminate chances of double quoting values.

score 3 · Answer 1 · answered Sep 18 '17 at 20:23

3

Change each of the substitutions to include optional quotes:

sed -E '1 { s/^"?/"/; s/"?,"?/","/g; s/"?$/"/ }' file.csv

I have added -E to enable extended mode, so that ? is understood to mean "0 or 1 match".

You could also keep on using basic mode (no -E) and replace each ? with either \{0,1\} (again, 0 or 1 match) or * (which matches 0 or more).

answered Sep 18 '17 at 20:23

Tom Fenech

72,334
12
107
141

Tom this command suits me best. If the values were missing the quotes and if some of them had it, they were skipped so first lines like `word1,"word2","word3","word4",word5` or `word1,word2,word3,word4,word5` always became `"word1","word2","word3","word4","word5"`. You guys were very helpful. I surely appreciate all the help! – Vituvo Sep 18 '17 at 23:55
Alas, this fails if a quoted field has a comma. Eg. `one,"two,two",three` becomes `"one","two","two","three"`. – Heath Raftery Mar 10 '20 at 04:48
You're right, @Heath, and there are probably many other inputs that would cause it to fail, after all it's just a series of substitutions that aren't context-aware. The best way to process a CSV file is to use a tool that supports them, such as Python with its [`csv` module](https://docs.python.org/3/library/csv.html). – Tom Fenech Mar 10 '20 at 07:46
Funny, using Python's `csv` module (from within a shell script) is indeed what I ended up doing. I'll add my answer. – Heath Raftery Mar 10 '20 at 13:03

Heath Raftery · Answer 2 · 2020-03-10T13:59:05.333

Regular expressions with sed and awk are subject to a seemingly never-ending series of edge cases that fail. Leveraging a csv library instead provides a great deal more robustness.

I found Python's library was the best choice because it's:

widely available without onerous dependencies, with the exception of Python itself;
not particular sensitive to the version of Python you use;
lends itself to being embedded within a shell script; and
is quite compact (a one-liner will do!).

Thus, my solution is along the lines of:

QUOTE_CSV_PY='import sys; import csv; csv.writer(sys.stdout, quoting=csv.QUOTE_ALL).writerows(csv.reader(sys.stdin))'
head -1 file.csv | python -c "$QUOTE_CSV_PY"; tail -n +2 file.csv

To break it down:

QUOTE_CSV_PY is a shell variable containing the Python one-liner commands
The Python commands simply import the standard sys and csv modules. It then creates a csv writer that writes to stdout with QUOTE_ALL set so all fields get quoted. It is fed a csv reader that reads from stdin.
head -1 sends the first line to the python interpreter for processing.
; tail -n +2 waits until the processing is done and then just dumps out every line from number two onwards.

Nice, the only thing I would say is that `"$QUOTE_CSV_PY"` should be quoted when you use it (it has no nasty side-effects here, but could with other scripts). — Tom Fenech, Mar 10 '20 at 13:34
Quite right. I errantly dropped the quotes while transcribing. Fixed. — Heath Raftery, Mar 10 '20 at 14:00
nice - as you said - this is much more robust, and can be easily extended too :D ! — Brad Parks, Mar 11 '23 at 12:10

score 1 · Accepted Answer · answered Sep 18 '17 at 20:30

1

This problem suits awk more than sed due to row/column processing:

awk 'BEGIN{FS=OFS=","} NR==1 {
   for (i=1; i<=NF; i++) {gsub(/^"|"$/, "", $i); $i = "\"" $i "\""}
} 1' file

"word1","word2","word3","word4","word5"
12345,12346,12347,12348,12349

Using gsub function we remove leading or trailing double quote, if it exists
Then we can safely wrap each cell in double quotes

answered Sep 18 '17 at 20:30

anubhava

761,203
64
569
643

1

This one worked great. I even placed quotes at random values like so: `word1,word2,"word3",word4,"word5"` and the final output was always `"word1","word2","word3","word4","word5"` – Vituvo Sep 18 '17 at 23:49

score 0 · Answer 4 · answered Sep 18 '17 at 20:39

0

Keep your existing working sed command, by removing all possible double quotes first:

sed '1 { s/"//g; s/^/"/; s/,/","/g; s/$/"/ }' file.csv

answered Sep 18 '17 at 20:39

George Vasiliou

6,130
2
20
27

This was close. If the values had quotes already, everything was left the same except the last value which had a space before the closing quotation like so: `"word1","word2","word3","word4","word5 "` – Vituvo Sep 18 '17 at 23:38

score 0 · Answer 5 · answered Sep 19 '17 at 00:05

To test each answer I created three files:

file.csv

word1,word2,word3,word4,word5  
12345,12346,12347,12348,12349

file2.csv

"word1","word2","word3","word4","word5"  
12345,12346,12347,12348,12349

file3.csv

"word1",word2,word3,"word4",word5  
12345,12346,12347,12348,12349

Then I created a bash script

#!/bin/bash  

sed -E '1 { s/^"?/"/; s/"?,"?/","/g; s/"?$/"/ }' file.csv > final.csv  
sed -E '1 { s/^"?/"/; s/"?,"?/","/g; s/"?$/"/ }' file2.csv > final2.csv  
sed -E '1 { s/^"?/"/; s/"?,"?/","/g; s/"?$/"/ }' file3.csv > final3.csv

Then I looked at the final files and the first lines were perfect.

# cat final*.csv  

"word1","word2","word3","word4","word5"  
12345,12346,12347,12348,12349  
"word1","word2","word3","word4","word5"  
12345,12346,12347,12348,12349  
"word1","word2","word3","word4","word5"  
12345,12346,12347,12348,12349

Bash How to wrap values of the first line of a csv file with quotations, if they do not exist

5 Answers5

Linked