I receive text from some writers that has a string like: string "string "string.
I want it to read string "string" string.
I've tried various sed tricks but none work.
Here is one failed attempt:
sed 's/.* "/.*"/g'
I receive text from some writers that has a string like: string "string "string.
I want it to read string "string" string.
I've tried various sed tricks but none work.
Here is one failed attempt:
sed 's/.* "/.*"/g'
Your attempt fails for multiple reasons.
The wildcard .*
will consume as much as it can in the string, meaning it will only ever allow a single substitution to happen (the final double quote in the string).
You cannot use .*
in the substitution part -- what you substitute with is just a string, not a regular expression. The way to handle "whatever (part of) the regex matched" is through backreferences.
So here is a slightly less broken attempt:
sed 's/"\([^"]*\) "/"\1"/g' file
This will find a double quote, then find and capture anything which is not a double quote, then find a space and a double quote; and substitute the entire match with a double quote, the first captured expression (aka back reference, or backref), and another double quote. This should fix strings where the only problem is a number of spaces inside the closing double quotes, but not missing whitespace after the closing double quote, nor strings with leading spaces inside the double quotes or unpaired double quotes.
The lack of spaces after can easily be added;
sed 's/"\([^"]*\) " */"\1" /g;s/ $//' file
This will add a space after every closing double quote, and finally trim any space at end of line to fix up this corner case.
Now, you could either try to update the regex for leading spaces, or just do another pass with a similar regex for those. I would go with the latter approach, even though the former is feasible as well (but will require a much more complex regex, and the corner cases are harder to keep in your head).
sed 's/"\([^"]*\) " */"\1" /g;s/ $//;
s/ *" \([^"]*\)"/ "\1"/g;s/^ //' file
This will still fail for inputs with unbalanced double quotes, which are darn near impossible to handle completely automatically anyway (how do you postulate where to add a missing double quote?)
This may work for some cases but may fail with unbalanced quotes:
sed 's/"\([^"]*\S\)\s\s*"/"\1"/g'
to also add space after a quoted phrase, if a space is missing:
sed -e 's/"\([^"]*\S\)\s\s*"/"\1"/g' -e 's/\("[^"]*"\)\([^"]\)/\1 \2/g'
Here is an awk
solution:
echo 'string "string "string.' | awk -F' "' '{for (i=1;i<=NF;i++) printf (i%2==0?"\"":"")"%s"(i%2==0?"\"":"")(i!=NF?" ":""),$i;print ""}'
string "string" string.
It looks at numbers of quotes, and every second quotes should be behind text.