I have a string: prawy p pęknięty p zderzak pęknięcie
(it's Polish
language)
I want to select all p
(except "p" in words "pęknięty" and "peknięcie")
I've tried to do something like that: \b(s*ps*)\b
, but it doesn't work properly. Any ideas?
I have a string: prawy p pęknięty p zderzak pęknięcie
(it's Polish
language)
I want to select all p
(except "p" in words "pęknięty" and "peknięcie")
I've tried to do something like that: \b(s*ps*)\b
, but it doesn't work properly. Any ideas?
Maybe,
\bp(?=[a-z]+|\s|$)
or
(?!pęknięcie|pęknięty)\bp
might simply work fine.
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
jex.im visualizes regular expressions:
You might use a negative lookahead and a character class:
\bp(?!([eę]knię(?:cie|ty)\b)
In parts
\bp
preceded by a word boundary(?!
If what is directly on the right is not
[eę]knię
Match e
or ę
followed by knię
(?:cie|ty)\b
Match cie
or ty
and a word boundary)
Close negative lookaheadUsing a character class might match an invalid variation of e
or ę
in the words.
To match the words exactly you could match them between word boundaries
\bp(?!ęknięty\b|ęknięcie\b)