RegExp: last symbol missing

Question

I wrote a regular expression:

^(проезд|пл|пр-кт|пер|наб|линия|км|б-р|аллея|кв-л)\s(.+)[^\)]$

Which divides the string into street type and street name. Some streets have street location description, which i don't want to select.

Here it is: https://regex101.com/r/j3gF5b/2

It works, but every last symbol in street name are missing. I would like to know why this happens and how to repair it?

Could you clarify: You dont want the streets with location description OR you dont want the location description itself? — Fallenhero, Nov 25 '16 at 11:48

Fallenhero · Accepted Answer · 2016-11-25T11:43:06.727

2

Your [^)] matches exactly 1 character that is not ) ... that is your missing letter from street name

You could use this:

^(проезд|пл|пр-кт|пер|наб|линия|км|б-р|аллея|кв-л)\s([^\)]+?)$

edited Nov 25 '16 at 11:43

answered Nov 25 '16 at 11:37

Fallenhero

1,563
1
8
17

so, how can i miss lines with brackets and not lost last symbol? – Oleg Bizin Nov 25 '16 at 11:42

score 1 · Answer 2 · answered Nov 25 '16 at 11:41

The reason is that the negated character class still consumes the matching substring. Use a negative lookbehind (a zero-width assertion, a non-consuming construct) after asserting the end of string/line:

^(проезд|пл|пр-кт|пер|наб|линия|км|б-р|аллея|кв-л)\s(.+)$(?<!\))
                                                         ^^^^^^^

It will fail all the matches that end with ).

See the regex demo

Another way is via using a negative lookahead (if the regex engine doesn't support lookbehinds, as in JavaScript):

^(проезд|пл|пр-кт|пер|наб|линия|км|б-р|аллея|кв-л)(?!.*\)$)\s*(.+)$

See another demo

Looks like engine not support this feature, but thanks – Oleg Bizin Nov 25 '16 at 11:52 — Oleg Bizin, Nov 25 '16 at 11:52
What do you mean? Which engine are you using? – Wiktor Stribiżew Nov 25 '16 at 11:53 — Wiktor Stribiżew, Nov 25 '16 at 11:53

Casimir et Hippolyte · Answer 3 · 2016-11-25T11:56:55.110

1

Put it inside the capture group (and eventually exclude also the newline character):

^(проезд|пл|пр-кт|пер|наб|линия|км|б-р|аллея|кв-л)\s(.*[^)\r\n])$

demo

If you only want to discard the location description and keep the rest:

^(проезд|пл|пр-кт|пер|наб|линия|км|б-р|аллея|кв-л)\s([^(\s]*(?:\h+[^(\s]+)*)

demo

edited Nov 25 '16 at 11:56

answered Nov 25 '16 at 11:45

Casimir et Hippolyte

88,009
5
94
125

RegExp: last symbol missing

3 Answers3