2

Problem Description

I have string "Վիկտոր1 Ափոյան2" using regular expression I want to get first letters of both strings. So as a result I will have "ՎԱ" As string is unicode I'm musing following regex:

"(\\p{L})\\p{L}*\\s(\\p{L})\\p{L}*

Which works fine if string does not contains numbers "1", "2", to get result I also tried with following regex:

"(\\p{L}\\p{N})\\p{L}\\p{N}*\\s(\\p{L}\\p{N})\\p{L}\\p{N}*

but this does not work correct.

Is there a something like "\\p{LN}" which will check for Unicode letters and numbers at the same time, or anyone knows how I can solve this issue?

Community
  • 1
  • 1

1 Answers1

6

Is there a something like "\p{LN}" which will check for Unicode letters and numbers at the same time

Use a character class [\p{L}\p{N}] that will match either a Unicode letter or a digit.

Alternatively use \p{Alnum} with a Pattern.UNICODE_CHARACTER_CLASS flag (or prepend the pattern with (?U)).

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • What would be the regex to replace every character that is *not* unicode or numeric? – Avatar Mar 03 '22 at 12:06
  • @Avatar Every char is Unicode. To match any non-digit, use `\P{N}` or `\D`. – Wiktor Stribiżew Mar 03 '22 at 12:09
  • Oh, I meant how to allow only alphabetic characters with UTF-8 and numbers. Actually coming from https://stackoverflow.com/q/11989482/1066234 and found your answer here. `preg_replace` removes them, so we need a regex that catches all others. – Avatar Mar 03 '22 at 12:11
  • 2
    @Avatar `[\p{L}\p{N}]` is a positive character class, it matches any chars that are defined inside it: letters and digits. To match anything but those chars, make the character class a negated character class, `[^\p{L}\p{N}]`. – Wiktor Stribiżew Mar 03 '22 at 12:16