2

I have a CSV file in which every column contains unnecessary extra spaces added to it before the actual value. I want to create a new CSV file by removing all the spaces.

For example

One line in input CSV file

 123, ste hen, 456, out put

Expected output CSV file

123,ste hen,456,out put

I tried using awk to trim each column but it didn't work.

anubhava
  • 761,203
  • 64
  • 569
  • 643
stephenjacob
  • 141
  • 2
  • 3
  • 14

8 Answers8

8

This sed should work:

sed -i.bak -E 's/(^|,)[[:blank:]]+/\1/g; s/[[:blank:]]+(,|$)/\1/g' file.csv

This will remove leading spaes, trailing spaces and spaces around comma.

Update: Here is an awk command to do the same:

awk -F '[[:blank:]]*,[[:blank:]]*' -v OFS=, '{
  gsub(/^[[:blank:]]+|[[:blank:]]+$/, ""); $1=$1} 1' file
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    The `[[:blank:]]` character class is handy as it accommodate tabs – sjsam Jul 27 '16 at 10:13
  • 1
    Yes it match Space or Tab but not newlines. – anubhava Jul 27 '16 at 10:16
  • You may consider removing blank lines in the `sed` solution. – sjsam Jul 27 '16 at 12:33
  • 1
    @sjsam why would you want to remove blank lines? I don't see any hint in the question that that'd be desirable. – Ed Morton Jul 27 '16 at 13:07
  • 1
    @EdMorton : right ! Adding such an option is nbd anyway. – sjsam Jul 28 '16 at 07:25
  • When I tried this the line `"Smith ","","Mr ","123"," ","Main","St "," "," "," ","123 Main St",""` appears to be completely unchanged with all it's trailing spaces. I ran `head -10 sed9EdJiw`against the temp file while it's still in the process of writing. I'm new to linux. Is the content being fed the old file? – Altimus Prime Sep 30 '17 at 04:37
  • It is unchanged because this solution removes spaces after comma only. – anubhava Sep 30 '17 at 18:29
5

awk is your friend.

Input

$ cat 38609590.txt
Ted Winter, Evelyn Salt, Peabody
  Ulrich, Ethan Hunt, Wallace
James Bond, Q,  M
(blank line)

Script

$ awk '/^$/{next}{sub(/^[[:blank:]]*/,"");gsub(/[[:blank:]]*,[[:blank:]]*/,",")}1' 38609590.txt

Output

Ted Winter,Evelyn Salt,Peabody
Ulrich,Ethan Hunt,Wallace
James Bond,Q,M

Note

  • This one removes the blank lines too - /^$/{next}.
  • See the [ awk ] manual for more information.
sjsam
  • 21,411
  • 5
  • 55
  • 102
3

To remove leading blank chars with sed:

$ sed -E 's/(^|,) +/\1/g' file
123,ste hen,456,out put

With GNU awk:

$ awk '{$0=gensub(/(^|,) +/,"\\1","g")}1' file
123,ste hen,456,out put

With other awks:

$ awk '{sub(/^ +/,""); gsub(/, +/,",")}1' file
123,ste hen,456,out put

To remove blank chars before and after the values with sed:

$ sed -E 's/ *(^|,|$) */\1/g' file
123,ste hen,456,out put

With GNU awk:

$ awk '{$0=gensub(/ *(^|,|$) */,"\\1","g")}1' file
123,ste hen,456,out put

With other awks:

$ awk '{gsub(/^ +| +$/,""); gsub(/ *, */,",")}1' file
123,ste hen,456,out put

Change (a single blank char) to [[:blank:]] if you can have tabs as well as blank chars.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
2
echo " 123, ste hen, 456, out put" | awk '{sub(/^ +/,""); gsub(/, /,",")}1'
123,ste hen,456,out put
Claes Wikner
  • 1,457
  • 1
  • 9
  • 8
  • 1
    But it can't handle _extra space**S**_ OP stated in the question at present form. – James Brown Jul 27 '16 at 10:54
  • This will do it: `awk '{sub(/ +/,""); gsub(/ *, */,",")}1' test.in` . First the `sub` removes all the leading space on the record and the `gsub` handles spaces around commas. Record trailing space would need another `gsub` that could replace the first `sub` something like this: `gsub(/^ *| *$/,"")`. – James Brown Jul 27 '16 at 11:13
  • Thanks for your advice! – Claes Wikner Jul 27 '16 at 11:37
  • Thanks Claies/James this one worked for me. Thank you so much guys! – stephenjacob Jul 28 '16 at 05:51
  • 1
    @stephenjacob This won't strip all unwanted spaces. Test with `echo " 123 , ste hen, 456, out put " | awk '{sub(/^ +/,""); gsub(/, /,",")}1' | cat -vte` to see remaining spaces. Then test same with my suggested `sed` i.e. `echo " 123 , ste hen, 456, out put " | sed -E 's/(^|,)[[:blank:]]+/\1/g; s/[[:blank:]]+(,|$)/\1/g' | cat -vte` – anubhava Jul 28 '16 at 06:13
2

Another way to do with awk to remove multiple leading white-spaces is as below:-

$ awk 'BEGIN{FS=OFS=","} {s = ""; for (i = 1; i <= NF; i++) gsub(/^[ \t]+/,"",$i);} 1' <<< "123, ste hen, 456, out put"
123,ste hen,456,out put
  • FS=OFS="," sets the input and output field separator to ,
  • s = ""; for (i = 1; i <= NF; i++) loops across each column entry up to the end (i.e. from $1,$2...NF) and the gsub(/^[ \t]+/,"",$i) trims only the leading white-space and not anywhere else (one ore more white-space, note the +) from each column.

If you are want to do this action for an entire file, suggest using a simple script like below

#!/bin/bash
# Output written to the file 'output.csv' in the same path    

while IFS= read -r line || [[ -n "$line" ]]; do   # Not setting IFS here, all done in 'awk', || condition for handling empty lines
   awk 'BEGIN{FS=OFS=","} {s = ""; for (i = 1; i <= NF; i++) gsub(/^[ \t]+/,"",$i);} 1' <<< "$line" >> output.csv
done <input.csv
Inian
  • 80,270
  • 14
  • 142
  • 161
2
$ cat > test.in
 123, ste hen, 456, out put
$ awk -F',' -v OFS=',' '{for (i=1;i<=NF;i++) gsub(/^ +| +$/,"",$i); print $0}' test.in
123,ste hen,456,out put

or written out loud:

BEGIN {
  FS=","                  # set the input field separator
  OFS=","                 # and the output field separator
}
{
  for (i=1;i<=NF;i++)     # loop thru every field on record
    gsub(/^ +| +$/,"",$i) # remove leading and trailing spaces
  print $0                # print out the trimmed record
}

Run with:

$ awk -f test.awk test.in
James Brown
  • 36,089
  • 7
  • 43
  • 59
  • 1
    If you use `-F' *, *'` then you don't need the loop, just 1 `gsub(/^ +| +$/,"")` and a field assignment `$1=$1`. – Ed Morton Jul 27 '16 at 13:18
1
awk -F' *, *' '$1=$1' OFS=, file_path
hxysayhi
  • 1,888
  • 18
  • 25
-3

You could try :

  • your file : ~/path/file.csv

cat ~/path/file.csv | tr -d "\ " sed "s/, /,/g" ~/path/file.csv

lolotux
  • 9
  • 1