0

In a search field there should only be allowed regular characters and german mutated vowels like ä, ö, ü, Ä, Ö, Ü and ß.

My regex looks like:

/(<([^>]+)>)|[^a-zA-Z0-9äöüÄÖÜß\s]/ig

The replace:

phrase.replace(regex, "")

Before the replace:

Ärzte

After the replace:

rzte

Unfortunately the mutated vowels are getting removed by the replace. Any suggestions to keep these characters are appreciated.

Thanks in advance.

Bernhard Kraus
  • 329
  • 1
  • 3
  • 21
  • I cannot reproduce the error ([demo](https://regex101.com/r/hR0vR9/1)) – Wiktor Stribiżew Feb 29 '16 at 08:35
  • 1
    Ensure that your source code (file) has UTF-8 charset – hindmost Feb 29 '16 at 08:38
  • Ditto, it works perfectly for me. – Aaron Feb 29 '16 at 08:38
  • 1
    Indeed I have no possibility to convert the source file into utf-8. The way I solved it was like Aarons solution below knowing that hex values are just a workaround and not perfect. /(<([^>]+)>)|[^a-zA-Z0-9\xE4\xF6\xFC\xC4\xD6\xDC\xDF\s]/ig – Bernhard Kraus Feb 29 '16 at 09:13
  • @BernhardKraus It really is not a workaround to be honest. There are special characters that get escaped in regex, using a hex code is similar to doing that. Also, as I stated in my post, there are a lot of regex scripts out there that try to validate email addresses etc, and these regex scripts almost always uses hex codes for the accented characters. I am sorry that georg put down my answer and made you question it, but it really is the best practice to do it the way I suggested, at least in Javascript for the Internet. Feel confident with using hex codes, there's nothing wrong about it! :) – Aaron Feb 29 '16 at 11:01

1 Answers1

3

The issue is most likely the charset not being set to UTF-8. You should fix your charset, but better practice might be to use hex codes for that regex instead of using the character directly, and then leaving some comments in there so you remember what the hex codes were for.

Check if this works

phrase.replace(/(<([^>]+)>)|[^a-z0-9\xE4\xF6\xFC\xC4\xD6\xDC\xDF\s]/ig, "\n")

You can find some other hex escapes from here http://www.javascripter.net/faq/accentedcharacters.htm

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Aaron
  • 394
  • 2
  • 8
  • 1
    Bad advice. They need to have their encoding fixed, not to create workarounds that hurt readability. – georg Feb 29 '16 at 08:45
  • 1
    Explain to me why it is in official standards use hex escapes then, and why most JS based regex around the Internet that attempts email verification uses hex escapes.. http://www.regular-expressions.info/email.html (scroll down to RFC 5322). Why not do it the most functional way and then leave a comment right after the regex so the reader can see what characters the hex codes are for. – Aaron Feb 29 '16 at 08:48
  • 1
    I find it weird that a working and fully explained answer is dismissed as "bad advice", when it's a completely practical solution to the problem (and, judging by the conversation above, should be the accepted answer). – Chris Lear Feb 29 '16 at 10:09
  • Thank you @ChrisLear. I just don't like the stackoverflow down voting system. Seems to favour opinions instead of facts. – Aaron Feb 29 '16 at 11:02
  • 1
    Thanks Aaron for your above comment. Your solution has been upvoted and marked as the right answer. At last it seems to be a matter of opinion if a developer sees bad advice or the right approach to solve the problem. Thanks for helping. – Bernhard Kraus Feb 29 '16 at 11:19