regex substitute several special characters with other special characters in Textwrangler

Question

The character ̈ (unicode 0x308) cannot be represented in the “Western (ISO Latin 9)” encoding.

I need to replace several (3) of this special characters in many txt-files. Ideal would be one single regex command for the TEXTWRANGLER editor application I run on my Mac so I can use in the find&replace function of Textwrangler (similar to BBedit).

Here are the 3 special chars:

ä into ä
ö into ö
ü into ü

(please note the first letter persists of two chars (e.g. the a and the ̈ unicode 0x308) and therefore it is not WESTERN ISO LATIN compatibel.

I tried regex (groups) but I was not successfull: In TEXTWRANGLER I use the find&replace function (incl. grep=regex option)

FIND: (ä|ö|ü)+

REPLACE: \1ä , \2ö , \3ü

any idea?

You can't do it with one regex command. That would be equivalent to create conditional replaces which is not supported in regex. As of yet. At least AFAIK — Jorge Campos, Nov 15 '17 at 17:06
The only possible way to do this with a single regex is to append all the characters you want to use as a replacement to the end of your file and then match it, something like this: `ä([\s\S]*)(ä)` with replace of `$2$1`. Combining multiple of these into one, you'd get `ä([\s\S]*)(ä)|ö([\s\S]*)(ö)` with replace of `$2$1$4$3`. Ideally, you'd want to use a branch reset though so that you could have `(?|ä([\s\S]*)(ä)|ö([\s\S]*)(ö))` with replace of `$2$1`. That's the only method I'm aware of to have conditional replacements. Otherwise you'll have to use separate regular expressions. — ctwheels, Nov 15 '17 at 17:15

ctwheels · Answer 1 · 2017-11-15T17:46:40.613

Brief

I've just tested this with Notepad++, although I'm not sure if this will work in any Mac text editor alternatives.

This method is a conditional replacement using a dictionary in regex. It's more of a hack, but it does work assuming it's supported by the text editor. Once you're done remove the dictionary from the bottom of the file.

Code

See regex in use here

(ä|ö|ü)(?=[\s\S]*Dictionary:[\s\S]*\1=([^\s=:]+))

Replacement

\2

Results

Input

ä into a
ö into o
ü into u

Input - Modified

This input includes the dictionary at the end

ä into a
ö into o
ü into u

Dictionary:
ä=a
ö=o
ü=u

Output

a into a
o into o
u into u

Dictionary:
ä=a
ö=o
ü=u

Explanation

(ä|ö|ü) Capture either character in the group into capture group 1
(?=[\s\S]*Dictionary:[\s\S]*\1=([^\s=:]+)) Positive lookahead ensuring what follows matches
- [\s\S]* Match any character any number of times
- Dictionary: Match Dictionary: literally (this can be changed to anything, but you should make sure this is a unique string that won't be present anywhere else in your input)
- [\s\S]* Match any character any number of times
- \1 Match the same text as most recently matched by the first capture group
- = Match the equal sign character = literally
- ([^\s=:]+) Capture one or more of any character not present in the set (not whitespace, = or :) into capture group 2