2

I am getting a comma delimited file with double quotes to string and date fields. we are getting " and new line feeds in string columns like below.

"1234","asdf","with"doublequotes","new line
feed","withmultiple""doublequotes"

want output like

"1234","asdf","withdoublequotes","new linefeed","withmultipledoublequotes"

I have tried

sed 's/\([^",]\)"\([^",]\)/\1\2/g;s/\([^",]\)""/\1"/g;s/""\([^",]\)/"\1/g' < infile > outfile

its removing double quotes in string and removing last double quote like below

"1234","asdf","withdoublequotes","new line
feed","withmultiple"doublequotes

is there a way to remove " and new line feed comes in between ", and ,"

RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
vishnu
  • 21
  • 1

1 Answers1

0

Your substitutions for two consecutive quotes didn't work because they are placed after the substitution for a sole quote, when only one of the two is left.

We could remove " by repeated substitutions (otherwise a quote inserted by the substitution would stay) and new line feed by joining the next input line if the current one's end is no quote:

sed ':1;/[^"]$/{;N;s/\n//;b1;};:0;s/\([^,]\)"\([^,]\)/\1\2/g;t0' <infile >outfile
Armali
  • 18,255
  • 14
  • 57
  • 171