Removing spaces for all the columns of a CSV file in bash/unix

Question

I have a CSV file in which every column contains unnecessary extra spaces added to it before the actual value. I want to create a new CSV file by removing all the spaces.

For example

One line in input CSV file

 123, ste hen, 456, out put

Expected output CSV file

123,ste hen,456,out put

I tried using awk to trim each column but it didn't work.

so can you show what you tried? – fedorqui Jul 27 '16 at 09:57 — fedorqui, Jul 27 '16 at 09:57

anubhava · Answer 1 · 2016-07-27T12:17:39.163

8

This sed should work:

sed -i.bak -E 's/(^|,)[[:blank:]]+/\1/g; s/[[:blank:]]+(,|$)/\1/g' file.csv

This will remove leading spaes, trailing spaces and spaces around comma.

Update: Here is an awk command to do the same:

awk -F '[[:blank:]]*,[[:blank:]]*' -v OFS=, '{
  gsub(/^[[:blank:]]+|[[:blank:]]+$/, ""); $1=$1} 1' file

edited Jul 27 '16 at 12:17

answered Jul 27 '16 at 09:58

anubhava

761,203
64
569
643

1

The `[[:blank:]]` character class is handy as it accommodate tabs – sjsam Jul 27 '16 at 10:13
1

Yes it match Space or Tab but not newlines. – anubhava Jul 27 '16 at 10:16
You may consider removing blank lines in the `sed` solution. – sjsam Jul 27 '16 at 12:33
1

@sjsam why would you want to remove blank lines? I don't see any hint in the question that that'd be desirable. – Ed Morton Jul 27 '16 at 13:07
1

@EdMorton : right ! Adding such an option is nbd anyway. – sjsam Jul 28 '16 at 07:25
When I tried this the line `"Smith ","","Mr ","123"," ","Main","St "," "," "," ","123 Main St",""` appears to be completely unchanged with all it's trailing spaces. I ran `head -10 sed9EdJiw`against the temp file while it's still in the process of writing. I'm new to linux. Is the content being fed the old file? – Altimus Prime Sep 30 '17 at 04:37
It is unchanged because this solution removes spaces after comma only. – anubhava Sep 30 '17 at 18:29

sjsam · Answer 2 · 2016-07-27T11:44:46.547

awk is your friend.

Input

$ cat 38609590.txt
Ted Winter, Evelyn Salt, Peabody
  Ulrich, Ethan Hunt, Wallace
James Bond, Q,  M
(blank line)

Script

$ awk '/^$/{next}{sub(/^[[:blank:]]*/,"");gsub(/[[:blank:]]*,[[:blank:]]*/,",")}1' 38609590.txt

Output

Ted Winter,Evelyn Salt,Peabody
Ulrich,Ethan Hunt,Wallace
James Bond,Q,M

Note

This one removes the blank lines too - /^$/{next}.
See the [ awk ] manual for more information.

Ed Morton · Answer 3 · 2016-07-27T13:09:56.630

To remove leading blank chars with sed:

$ sed -E 's/(^|,) +/\1/g' file
123,ste hen,456,out put

With GNU awk:

$ awk '{$0=gensub(/(^|,) +/,"\\1","g")}1' file
123,ste hen,456,out put

With other awks:

$ awk '{sub(/^ +/,""); gsub(/, +/,",")}1' file
123,ste hen,456,out put

To remove blank chars before and after the values with sed:

$ sed -E 's/ *(^|,|$) */\1/g' file
123,ste hen,456,out put

With GNU awk:

$ awk '{$0=gensub(/ *(^|,|$) */,"\\1","g")}1' file
123,ste hen,456,out put

With other awks:

$ awk '{gsub(/^ +| +$/,""); gsub(/ *, */,",")}1' file
123,ste hen,456,out put

Change (a single blank char) to [[:blank:]] if you can have tabs as well as blank chars.

Thanks Ed for your help! I have learned how to use sed and awk now. — stephenjacob, Jul 28 '16 at 05:56

Claes Wikner · Accepted Answer · 2016-07-27T16:49:58.783

2

echo " 123, ste hen, 456, out put" | awk '{sub(/^ +/,""); gsub(/, /,",")}1'
123,ste hen,456,out put

edited Jul 27 '16 at 16:49

answered Jul 27 '16 at 10:29

Claes Wikner

1,457
1
9
8

1

But it can't handle _extra space**S**_ OP stated in the question at present form. – James Brown Jul 27 '16 at 10:54
This will do it: `awk '{sub(/ +/,""); gsub(/ *, */,",")}1' test.in` . First the `sub` removes all the leading space on the record and the `gsub` handles spaces around commas. Record trailing space would need another `gsub` that could replace the first `sub` something like this: `gsub(/^ *| *$/,"")`. – James Brown Jul 27 '16 at 11:13
Thanks for your advice! – Claes Wikner Jul 27 '16 at 11:37
Thanks Claies/James this one worked for me. Thank you so much guys! – stephenjacob Jul 28 '16 at 05:51
1

@stephenjacob This won't strip all unwanted spaces. Test with `echo " 123 , ste hen, 456, out put " | awk '{sub(/^ +/,""); gsub(/, /,",")}1' | cat -vte` to see remaining spaces. Then test same with my suggested `sed` i.e. `echo " 123 , ste hen, 456, out put " | sed -E 's/(^|,)[[:blank:]]+/\1/g; s/[[:blank:]]+(,|$)/\1/g' | cat -vte` – anubhava Jul 28 '16 at 06:13

Inian · Answer 5 · 2016-07-27T11:08:38.497

Another way to do with awk to remove multiple leading white-spaces is as below:-

$ awk 'BEGIN{FS=OFS=","} {s = ""; for (i = 1; i <= NF; i++) gsub(/^[ \t]+/,"",$i);} 1' <<< "123, ste hen, 456, out put"
123,ste hen,456,out put

FS=OFS="," sets the input and output field separator to ,
s = ""; for (i = 1; i <= NF; i++) loops across each column entry up to the end (i.e. from $1,$2...NF) and the gsub(/^[ \t]+/,"",$i) trims only the leading white-space and not anywhere else (one ore more white-space, note the +) from each column.

If you are want to do this action for an entire file, suggest using a simple script like below

#!/bin/bash
# Output written to the file 'output.csv' in the same path    

while IFS= read -r line || [[ -n "$line" ]]; do   # Not setting IFS here, all done in 'awk', || condition for handling empty lines
   awk 'BEGIN{FS=OFS=","} {s = ""; for (i = 1; i <= NF; i++) gsub(/^[ \t]+/,"",$i);} 1' <<< "$line" >> output.csv
done <input.csv

There are shorter approaches than this, nice try though. When it comes to awk the general trend is to appreciate elegant idiomatic approaches. — Мона_Сах, Jul 27 '16 at 11:11

James Brown · Answer 6 · 2016-07-27T11:18:45.233

2

$ cat > test.in
 123, ste hen, 456, out put
$ awk -F',' -v OFS=',' '{for (i=1;i<=NF;i++) gsub(/^ +| +$/,"",$i); print $0}' test.in
123,ste hen,456,out put

or written out loud:

BEGIN {
  FS=","                  # set the input field separator
  OFS=","                 # and the output field separator
}
{
  for (i=1;i<=NF;i++)     # loop thru every field on record
    gsub(/^ +| +$/,"",$i) # remove leading and trailing spaces
  print $0                # print out the trimmed record
}

Run with:

$ awk -f test.awk test.in

edited Jul 27 '16 at 11:18

answered Jul 27 '16 at 10:45

James Brown

36,089
7
43
59

1

If you use `-F' *, *'` then you don't need the loop, just 1 `gsub(/^ +| +$/,"")` and a field assignment `$1=$1`. – Ed Morton Jul 27 '16 at 13:18

score 1 · Answer 7 · answered Oct 29 '20 at 11:40

1

awk -F' *, *' '$1=$1' OFS=, file_path

answered Oct 29 '20 at 11:40

hxysayhi

1,888
18
25

score -3 · Answer 8 · answered Jul 27 '16 at 10:05

-3

You could try :

your file : ~/path/file.csv

cat ~/path/file.csv | tr -d "\ " sed "s/, /,/g" ~/path/file.csv

answered Jul 27 '16 at 10:05

lolotux

9
1

Removing spaces for all the columns of a CSV file in bash/unix

8 Answers8

Linked