1

I am using Ruby regex to filter the user input to allow only numerics and alphabets of any language. But for some words the spelling is different after using regex. ex:

text = 'कंप्यूटर'
regex = /[^(\p{Alpha})]/
filter_text = text.gsub(regex, '') #return result कंपयूटर

You can see the input and output are different. How to resolve the same.

Chris
  • 26,361
  • 5
  • 21
  • 42
Praveenkumar
  • 921
  • 1
  • 9
  • 28

1 Answers1

1

You can use

regex = /[^\p{L}\p{Nd}\p{M}]+/

It will match any one or more chars other than Unicode letters or digits.

\p{Nd} matches all Unicode characters in the 'Number, Decimal Digit' category, \p{L} matches all Unicode letters and \p{M} matches any diacritic marks.

See the Ruby demo:

text = 'कंप्यूटर'
regex = /[^\p{L}\p{Nd}\p{M}]+/
filter_text = text.gsub(regex, '')
puts filter_text
# => कंप्यूटर
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563