-1

I receive text from some writers that has a string like: string "string "string.

I want it to read string "string" string.

I've tried various sed tricks but none work.

Here is one failed attempt:

sed 's/.* "/.*"/g'
tripleee
  • 175,061
  • 34
  • 275
  • 318

3 Answers3

1

Your attempt fails for multiple reasons.

The wildcard .* will consume as much as it can in the string, meaning it will only ever allow a single substitution to happen (the final double quote in the string).

You cannot use .* in the substitution part -- what you substitute with is just a string, not a regular expression. The way to handle "whatever (part of) the regex matched" is through backreferences.

So here is a slightly less broken attempt:

sed 's/"\([^"]*\) "/"\1"/g' file

This will find a double quote, then find and capture anything which is not a double quote, then find a space and a double quote; and substitute the entire match with a double quote, the first captured expression (aka back reference, or backref), and another double quote. This should fix strings where the only problem is a number of spaces inside the closing double quotes, but not missing whitespace after the closing double quote, nor strings with leading spaces inside the double quotes or unpaired double quotes.

The lack of spaces after can easily be added;

sed 's/"\([^"]*\) " */"\1" /g;s/ $//' file

This will add a space after every closing double quote, and finally trim any space at end of line to fix up this corner case.

Now, you could either try to update the regex for leading spaces, or just do another pass with a similar regex for those. I would go with the latter approach, even though the former is feasible as well (but will require a much more complex regex, and the corner cases are harder to keep in your head).

sed 's/"\([^"]*\) " */"\1" /g;s/ $//;
     s/ *" \([^"]*\)"/ "\1"/g;s/^ //' file

This will still fail for inputs with unbalanced double quotes, which are darn near impossible to handle completely automatically anyway (how do you postulate where to add a missing double quote?)

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • Note that some `sed` dialects require the capturing parentheses to be backslashed, while others disallow this. If this doesn't work, try removing the backslashes before the opening and closing parentheses. – tripleee Aug 11 '14 at 06:05
  • The second example above works. Thanks so much for the detailed explanation. I am a newbie so it wont let me vote it up. – user2883704 Aug 11 '14 at 06:10
  • Also, some `sed` dialects are unhappy about semicolons as command separators, but will happily accept newlines between commands. Many dialects also have an `-e` option which allows you to compose the script as a sequence of `-e` parameters. – tripleee Aug 11 '14 at 06:11
  • Some things just cannot be fixed without manual intervention but all three cases above will be valuable additions to my script. Thanks so much. – user2883704 Aug 11 '14 at 06:19
0

This may work for some cases but may fail with unbalanced quotes:

sed 's/"\([^"]*\S\)\s\s*"/"\1"/g'

to also add space after a quoted phrase, if a space is missing:

sed -e 's/"\([^"]*\S\)\s\s*"/"\1"/g' -e 's/\("[^"]*"\)\([^"]\)/\1 \2/g'
perreal
  • 94,503
  • 21
  • 155
  • 181
  • Unfortunately that solution produces string "string"string. I should also add that I am willing to use multiple sed commands. I am trying to create a sed script to fix various string problems found in submitted text. – user2883704 Aug 11 '14 at 05:59
  • Yes it does. I have used sed for years, but only for trivial cases. – user2883704 Aug 11 '14 at 06:16
0

Here is an awk solution:

echo 'string "string "string.' | awk -F' "' '{for (i=1;i<=NF;i++) printf (i%2==0?"\"":"")"%s"(i%2==0?"\"":"")(i!=NF?" ":""),$i;print ""}'
string "string" string.

It looks at numbers of quotes, and every second quotes should be behind text.

Jotne
  • 40,548
  • 12
  • 51
  • 55