1

I am using ultraedit with regex. I would like to find (and replace) and embedded double quotes found withing a string that starts/ends with a double quote. This is a text file with pipe | as the delimeter.

How do I find the embedded double quotes:

"This string is ok."|"This is example with a "C" double quoted grade in middle."|"Next line"

I eventually need to replace the double quotes in "C" to just have C.

Luís Cruz
  • 14,780
  • 16
  • 68
  • 100
john walker
  • 11
  • 1
  • 2

5 Answers5

4

The big trade off in CSV is correct parsing in every case versus simplicity. This is a resonably moderated approach. If you have really wily strings with quotes next to pipes in them, you better use something like PERL and Text::CSV.

There is a bother with a regex that requires a non-pipe character on each side of the quote (such as [^|]) in that the parser will absorb the C and then won't find the other quote next to the C.

This example will work pretty well as long as you don't have pipes and quotes next to each other in your actual CSV strings. The lookaheads and behinds are zero-width, so they do not remove any additional characters besides the quote.

   1     2       3    4
(?<!^)(?<!\|)"(?!\|)(?!$)
  1. Don't match quotes at the beginning of the line.
  2. Don't match quotes with a pipe in front.
  3. Don't match quotes with a pipe afterwards.
  4. Don't match quotes at the end of a string.

Every quote thus matched can be removed. Don't forget to specify global replacement to get all of the quotes.

kovacsbv
  • 351
  • 4
  • 11
  • Yes, this Perl regular expression with 2 look-behind and 2 look-ahead work very well to find double quotes within a value. Therefore it is very good for a manual inspection. I would not use it for a Replace All operation as it finds also escaped double quotes. But it is very good for manual search and replace. `(?<!^)(?<!["|])"(?!["|])(?!$)` would ignore a double quote if there is one more double quote on left or right side. But that results in ignoring also a single double quote at beginning or end of a value being already quoted like in `xx|"I replied: "Thanks!""|yy`. – Mofi Jan 26 '15 at 18:50
  • When I ran this replacement in Ultra Edit 21.30, it produced what I believe is the correct result: xx|"I replied: Thanks!"|yy – kovacsbv Jan 27 '15 at 18:52
  • Your regular expression reformats `xx|"I replied: "Thanks!""|yy` to `xx|"I replied: Thanks!"|yy` which is of course right if deletion of double quotes in value is wanted. But it reformats also `xx|"I replied: ""Thanks!"""|yy` to `xx|"I replied: Thanks!"|yy` although the double quotes in value are already correct escaped with double quotes and should be therefore not deleted. My macro works for both CSV lines by inserting additional double quotes on first example and keeping already correct formatted second example. Your expression works best if goal is to delete all double quotes in values. – Mofi Jan 28 '15 at 06:42
  • Look at the original question: I eventually need to replace the double quotes in "C" to just have C. Sounds to me like what was wanted. You're moderating in the "perfect parse every time" direction, which while a valid pursuit is not what the question was asking. – kovacsbv Jan 29 '15 at 13:41
  • 1
    I know that the questioner just asked for find (and replace) double quotes inside a value enclosed already in double quotes. This is the reason why I wrote that your expression works here and why I upvoted your answer. But other visitors are perhaps interested in just finding double quotes in values not being escaped by double quotes which is the reason for my comments. Those comments are no criticisms on your solution. They are just add-ons explaining what happens with correct escaped double quotes within a value and what could be done to avoid it (partly). – Mofi Jan 29 '15 at 15:02
1

Try this find:

(["][^"]*)["]C["]([^"]*["])

and replace:

\1C\2

Turn on Regular Expressions in Perl mode.

Screen shot of find replace regex in UltraEdit

Screen shot of

UltraEdit Professional Text/HEX Editor 
Version 21.30.0.1005

Trying it out.

Start with:

"This string is ok."|"This is example with a "C" double quoted grade in middle."|"Next line"
"This string is ok."|"This is example with a C double quoted grade in middle."|"Next line"

Ends with:

"This string is ok."|"This is example with a C double quoted grade in middle."|"Next line"
"This string is ok."|"This is example with a C double quoted grade in middle."|"Next line"

Breakdown of the regex FIND.

First part.

(["][^"]*)
from (["][^"]*)["]C["]([^"]*["])

This looks for a sequence of:

  1. Double quote: ["].
  2. Any number of characters that are not double quotes: [^"]*
  3. The brackets that surround ["][^"]* indicate that the regex engine should store this sequence of characters so that the REPLACE part can refer back to it (as back references).
  4. Note that this is repeated at the start and end - meaning that there are two sequences stored.

Second part.

["]C["]
from (["][^"]*)["]C["]([^"]*["])

This looks for a sequence of:

  1. Double quote: ["].
  2. The capital letter C (which may or may not stand for Cookies).
  3. Double quote: ["].

Breakdown of the regex REPLACE.

\1C\2
  1. \1 is a back reference that means replace this with the first sequence saved.
  2. The capital letter C (which may or may not stand for Cookies).
  3. \2 is a back reference that means replace this with the second sequence saved.
Robert Mark Bram
  • 8,104
  • 8
  • 52
  • 73
0

For the example you gave just "\w" works as the regex to find "C"

Try it here

The replacing mechanism is probably built into ultraedit

Jay
  • 2,656
  • 1
  • 16
  • 24
0

You really don't want to do this with regex. You should use a csv parser that can understand pipe delimiters. If I were to this with just regex, I would use multiple replacements like this:

Find and replace the good quotes with placeholder to text. Start/end quote:

 s/(^"|"$)/QUOTE/g

Quotes near pipe delimiters:

s/"\|"/DELIMITER/g

Now only embedded double quotes remain. To delete all of them:

s/"//g

Now put the good quotes back:

s/QUOTE|DELIMITER/"/g
nanny
  • 1,098
  • 9
  • 19
0

nanny posted a good solution, but for a Perl script, not for usage in a text editor like UltraEdit.

In general it is possible to have double quotes within a field value. But each double quote must be escaped with one more double quote. This is explained for example in Wikipedia article about comma-separated values.

This very simple escaping algorithm makes reading in a CSV file character by character coded in a programming language very easy. But double quotes, separators and line breaks included in a double quoted value are a nightmare for a regular expression find and replace in a CSV file.

I have recorded several replaces into an UltraEdit macro

InsertMode
ColumnModeOff
Top
PerlReOn
Find MatchCase RegExp "^"|"$"
Replace All "QuOtE"
Find MatchCase ""|"
Replace All "QuOtE|"
Find MatchCase "|""
Replace All "|QuOtE"
Find MatchCase """"
Replace All "QuOtEQuOtE"
Find MatchCase """
Replace All """"
Find MatchCase "QuOtE"
Replace All """

The first replace is a Perl regular expression replace. Each double quote at beginning or end of a line is replaced by the string QuOtE by this replace. I'm quite sure that QuOtE does not exist in the CSV file.

Each double quote before and after the pipe character is also replaced by QuOtE by the next 2 non regular expression replaces.

Escaped double quotes "" in the CSV file are replaced next by QuOtEQuOtE with a non regular expression replace.

Now the remaining single double quotes are replaced by two double quotes to make them valid in CSV file. You could of course also remove those single double quotes.

Finally, all QuOtE are replaced back to double quotes.

Note: This is not the ultimate solution. Those replaces could produce nevertheless a wrong result, for example for an already valid CSV line like this one

"first value with separator ""|"" included"|second value|"third value again with separator|"|fourth value contains ""Hello!"""|fifth value

as the result is

"first value with separator """|""" included"|second value|"third value again with separator|"|fourth value contains ""Hello!"""|fifth value

PS: The valid example line above should be displayed in a spreadsheet application as

first value with separator "|" included    second value    third value again with separator|    fourth value contains "Hello!"    fifth value
Community
  • 1
  • 1
Mofi
  • 46,139
  • 17
  • 80
  • 143