6

I've got nearly the following problem and didn't find the solution. This could be my CSV file structure:

1223;"B630521 ("L" fixed bracket)";"2" width";"length: 5"";2;alternate A
1224;"B630522 ("L" fixed bracket)";"3" width";"length: 6"";2;alternate B

As you can see there are some " written for inch and "L" in the enclosing ".

Now I'm looking for a UNIX shell script to replace the " (inch) and "L" double quotes with 2 single quotes, like the following example:

sed "s/$OLD/$NEW/g" $QFILE > $TFILE && mv $TFILE $QFILE

Can anyone help me?

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
Karsten
  • 61
  • 1
  • have you tried s/\"/\'/g ? Or you just want to replace the doublequotes of "L" and inches? – Jav_Rock May 12 '11 at 14:53
  • I just want to replace the additional doublequotes. The result must be readable by sqlloader and some fields may contain a ; as a part of the text :( – Karsten May 12 '11 at 18:01

3 Answers3

3

Maybe this is what you want:

sed "s/\([0-9]\)\"\([^;]\)/\1''\2/g"

I.e.: Find double quotes (") following a number ([0-9]) but not followed by a semicolon ([^;]) and replace it with two single quotes.

Edit: I can extend my command (it's becoming quite long now):

sed "s/\([0-9]\)\"\([^;]\)/\1''\2/g;s/\([^;]\)\"\([^;]\)/\1\'\2/g;s/\([^;]\)\"\([^;]\)/\1\'\2/g"

As you are using SunOS I guess you cannot use extended regular expressions (sed -r)? Therefore I did it that way: The first s command replaces all inch " with '', the second and the third s are the same. They substitute all " that are not a direct neighbor of a ; with a single '. I have to do it twice to be able to substitute the second " of e.g. "L" because there's only one character between both " and this character is already matched by \([^;]\). This way you would also substitute "" with ''. If you have """ or """" etc. you have to put one more (but only one more) s.

bmk
  • 13,849
  • 5
  • 37
  • 46
  • 1
    This looks nearly good. All inches are replaced. But there's still my problem with the "L" and in some cases i've found a "Rest" between the field-delimiters also. :( – Karsten May 12 '11 at 18:04
  • What do you mean with "Rest"? Can you give an example? – bmk May 12 '11 at 19:09
  • There are other words between the additional doubleqoutes because it's a free-text/free-comment-field in the source-database. So it can be the "L" or "Rest" or "fippel-pippel" or anything else. – Karsten May 12 '11 at 19:55
  • 1
    @Karsten: I added a somewhat longer version of the command that should work now. – bmk May 12 '11 at 20:21
  • Thx bmk. But the "L" and all other (like "fippel-pippel") are still doublequoted. As there's no error on the console i'm afraid there's an error in your sed. I foudn this page http://compute.cnr.berkeley.edu/cgi-bin/man-cgi?sed+1 and it seems there's a -r option available. Does this help to simplify your statement? – Karsten May 13 '11 at 07:21
  • @Karsten: I made a change to my command that will now work also with SunOS` (at least 5.8) `sed`. It seems that the pipe (`|`) operator doesn't work there. – bmk May 13 '11 at 08:27
3

Update (Using perl it easy since you get full lookahead features)

perl -pe 's/(?<!^)(?<!;)"(?!(;|$))/'"'"'/g' file

Output

1223;"B630521 ('L' fixed bracket)";"2' width";"length: 5'";2;alternate A
1224;"B630522 ('L' fixed bracket)";"3' width";"length: 6'";2;alternate B

Using sed, grep only

Just by using grep, sed (and not perl, php, python etc) a not so elegant solution can be:

grep -o '[^;]*' file | sed  's/"/`/; s/"$/`/; s/"/'"'"'/g; s/`/"/g' 

Output - for your input file it gives:

1223
"B630521 ('L' fixed bracket)"
"2' width"
"length: 5'"
2
alternate A
1224
"B630522 ('L' fixed bracket)"
"3' width"
"length: 6'"
2
alternate B
  • grep -o is basically splitting the input by ;
  • sed first replaces " at start of line by `
  • then it replaces " at end of line by another `
  • it then replaces all remaining double quotes " by single quite '
  • finally it puts back all " at the start and end
anubhava
  • 761,203
  • 64
  • 569
  • 643
2

For the "L" try this:

 sed "s/\"L\"/'L'/g"

For inches you can try:

sed "s/\([0-9]\)\"\"/\1''\"/g" 

I am not sure it is the best option, but I have tried and it works. I hope this is helpful.

Jav_Rock
  • 22,059
  • 20
  • 123
  • 164
  • 2
    This looks good two. But in some cases i've found a "Rest" between the field-delimiters now. :( – Karsten May 12 '11 at 18:07