-3

I am looking for a solution to split a representative Swiss address into a street (and street number) and a zip code (and name of place).

Suggest, I have the following address:

'Bahnhofstrasse 1, 8001 Zürich'

The result I am looking for is:

street: 'Bahnhofstrasse 1'
place: '8001 Zürich'

However, sometimes there is a comma and sometimes not. But the postal code always consists of 4 digits?

I used the .split(') so far but that only works when a comma is present.

Patrick Balada
  • 1,330
  • 1
  • 18
  • 37
  • 1
    Are the last two fields in your data always zip and city? I'd be wary looking for 4 digit numbers as eventually, you'll hit a 4 digit address. – AlG Aug 10 '17 at 19:05
  • @ AIG Thank you for your comment. Yes, exactly. The "street" could also only be a name but the last two fields are always zip and city. Good point although I am pretty sure there are no 4 digit street numbers. – Patrick Balada Aug 10 '17 at 19:07

2 Answers2

4

I don't expect city names to have digits in them, Use this Pattern ^(.*?),?\s*(\d{4}\D+)$ Demo

^               # Start of string/line
(               # Capturing Group (1)
  .             # Any character except line break
  *?            # (zero or more)(lazy)
)               # End of Capturing Group (1)
,               # ","
?               # (zero or one)(greedy)
\s              # <whitespace character>
*               # (zero or more)(greedy)
(               # Capturing Group (2)
  \d            # <digit 0-9>
  {4}           # (repeated {4} times)
  \D            # <character that is not a digit>
  +             # (one or more)(greedy)
)               # End of Capturing Group (2)
$               # End of string/line
alpha bravo
  • 7,838
  • 1
  • 19
  • 23
0
(?P<street>.*?[0-9]+)(?P<place>.*?[0-9]+.*)

Explaination

everything between round brackets () is a capture group by adding ?P<street> we give it a name street (which is optional, but easier to read).

[0-9]+ means 1 or more number

.*? means everything (lazy): Matches between zero and unlimited times, as few times as possible, expanding as needed

This info combined makes a nice regex for this situation

enter image description here

online Thomas
  • 8,864
  • 6
  • 44
  • 85