0

The character ̈ (unicode 0x308) cannot be represented in the “Western (ISO Latin 9)” encoding.

I need to replace several (3) of this special characters in many txt-files. Ideal would be one single regex command for the TEXTWRANGLER editor application I run on my Mac so I can use in the find&replace function of Textwrangler (similar to BBedit).

Here are the 3 special chars:

  1. ä into ä
  2. ö into ö
  3. ü into ü

(please note the first letter persists of two chars (e.g. the a and the ̈ unicode 0x308) and therefore it is not WESTERN ISO LATIN compatibel.

I tried regex (groups) but I was not successfull: In TEXTWRANGLER I use the find&replace function (incl. grep=regex option)

FIND: (ä|ö|ü)+

REPLACE: \1ä , \2ö , \3ü

any idea?

mfuerli
  • 44
  • 7
  • You can't do it with one regex command. That would be equivalent to create conditional replaces which is not supported in regex. As of yet. At least AFAIK – Jorge Campos Nov 15 '17 at 17:06
  • The only possible way to do this with a single regex is to append all the characters you want to use as a replacement to the end of your file and then match it, something like this: `ä([\s\S]*)(ä)` with replace of `$2$1`. Combining multiple of these into one, you'd get `ä([\s\S]*)(ä)|ö([\s\S]*)(ö)` with replace of `$2$1$4$3`. Ideally, you'd want to use a branch reset though so that you could have `(?|ä([\s\S]*)(ä)|ö([\s\S]*)(ö))` with replace of `$2$1`. That's the only method I'm aware of to have conditional replacements. Otherwise you'll have to use separate regular expressions. – ctwheels Nov 15 '17 at 17:15

1 Answers1

0

Brief

I've just tested this with Notepad++, although I'm not sure if this will work in any Mac text editor alternatives.

This method is a conditional replacement using a dictionary in regex. It's more of a hack, but it does work assuming it's supported by the text editor. Once you're done remove the dictionary from the bottom of the file.


Code

See regex in use here

(ä|ö|ü)(?=[\s\S]*Dictionary:[\s\S]*\1=([^\s=:]+))

Replacement

\2

Results

Input

ä into a
ö into o
ü into u

Input - Modified

This input includes the dictionary at the end

ä into a
ö into o
ü into u

Dictionary:
ä=a
ö=o
ü=u

Output

a into a
o into o
u into u

Dictionary:
ä=a
ö=o
ü=u

Explanation

  • (ä|ö|ü) Capture either character in the group into capture group 1
  • (?=[\s\S]*Dictionary:[\s\S]*\1=([^\s=:]+)) Positive lookahead ensuring what follows matches
    • [\s\S]* Match any character any number of times
    • Dictionary: Match Dictionary: literally (this can be changed to anything, but you should make sure this is a unique string that won't be present anywhere else in your input)
    • [\s\S]* Match any character any number of times
    • \1 Match the same text as most recently matched by the first capture group
    • = Match the equal sign character = literally
    • ([^\s=:]+) Capture one or more of any character not present in the set (not whitespace, = or :) into capture group 2
ctwheels
  • 21,901
  • 9
  • 42
  • 77