2

I found out a regex pattern to remove all non alphabet letters: \p{L}

I thus did a regex to remove all non alphabet, non digit and non underscore pattern : /[^\p{L}\d_]/gimu

Unfortunately, it does not work with a hindi character like #फ्रांस which gives फरस

See for yourself here https://regex101.com/r/dnXDK0/1

And please help me :-)

Martin Ratinaud
  • 600
  • 5
  • 12

1 Answers1

4

You forgot about diacritics. You need to add \p{M} or \p{Mn} into the negated character class:

/[^\p{L}\p{M}\d_]/gu

See the regex demo.

Note you do not need the i and m flags here. m redefines anchor behavior, but your regex contains no ^ nor $. i makes caseful letters match in a case insensitive way, but \p{L} matches all letters, upper- and lowercase ones.

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563