1

I wrote a regular expression:

^(проезд|пл|пр-кт|пер|наб|линия|км|б-р|аллея|кв-л)\s(.+)[^\)]$

Which divides the string into street type and street name. Some streets have street location description, which i don't want to select.

Here it is: https://regex101.com/r/j3gF5b/2

It works, but every last symbol in street name are missing. I would like to know why this happens and how to repair it?

Robin Mackenzie
  • 18,801
  • 7
  • 38
  • 56
Oleg Bizin
  • 169
  • 6
  • 1
    Could you clarify: You dont want the streets with location description OR you dont want the location description itself? – Fallenhero Nov 25 '16 at 11:48

3 Answers3

2

Your [^)] matches exactly 1 character that is not ) ... that is your missing letter from street name

You could use this:

^(проезд|пл|пр-кт|пер|наб|линия|км|б-р|аллея|кв-л)\s([^\)]+?)$
Fallenhero
  • 1,563
  • 1
  • 8
  • 17
1

The reason is that the negated character class still consumes the matching substring. Use a negative lookbehind (a zero-width assertion, a non-consuming construct) after asserting the end of string/line:

^(проезд|пл|пр-кт|пер|наб|линия|км|б-р|аллея|кв-л)\s(.+)$(?<!\))
                                                         ^^^^^^^

It will fail all the matches that end with ).

See the regex demo

Another way is via using a negative lookahead (if the regex engine doesn't support lookbehinds, as in JavaScript):

^(проезд|пл|пр-кт|пер|наб|линия|км|б-р|аллея|кв-л)(?!.*\)$)\s*(.+)$

See another demo

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

Put it inside the capture group (and eventually exclude also the newline character):

^(проезд|пл|пр-кт|пер|наб|линия|км|б-р|аллея|кв-л)\s(.*[^)\r\n])$

demo

If you only want to discard the location description and keep the rest:

^(проезд|пл|пр-кт|пер|наб|линия|км|б-р|аллея|кв-л)\s([^(\s]*(?:\h+[^(\s]+)*)

demo

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125