0

I'm trying to create a regex pattern to find a german address (Street + Nr + ZIP + City) Exeample "Hauptstr. 5 - 59969 Hallenberg"

I have an individual pattern for Street + Nr (StreetNr)

@"^\s*((?:(?:A[nm]|Auf|Vor|In|Im|Der|Dem|Die|Das|Bei|Alt|Zu[mr]?|Sk?t[.,]?[-_]?|Dr[.,]?[-_]?|Prof[.,]?[-_]?|De[-_]?)" +
@"(?:\s*(?>Sk?t[,.]?|und|von|der|dem|des|den|in\s?der))?\s?)?" +
@"(?>(?:[\w-[0-9]]{3}(?:[-_.,]|\s+))|(?:[\w-[0-9]](?:[/.,\w-[0-9]]){3,})|(?:[\w-[0-9]][/.,][-_]?\s?){1,4}\s*[\w-[0-9]]{3,})?\s?" +
@"(?>[-_]?\s?[/.\w-[0-9]]{2,}){0,5}\s*(?>[-_]?\s?str\.{0,1}|17\.Juni|strasse|straße|platz|hof|weg|ring){0,1})\s*." +           
@"(\d{1,4}[a-z]?(?>\s?[-_]\s?\d{1,4}[a-z]?)?)\s*$"

and another one for ZIP + City (ZIPCity)

 @"^\s*(?:(?:D|DE)\s?[-_]?\s?){0,1}([0-9]{5})\s+([\w-[0-9]]+" +
 @"(?>\s?[-_/]?\s?[\w-[0-9]]+){0,5})\s*$",

1- What I First need is help with the ZIPCity pattern once the String is "59969 Hallenberg" the pattern match, but if there is any special char before the ZIP the pattern unfortunatly don't match What I need for example " - 59969 Hallenberg", " -59969 Hallenberg" "- 59969 Hallenberg" or "59969 Hallenberg" I need to finde "59969 Hallenberg". "-" is just an example as it could be any non alphanumeric character. and it should be optional

2- I need a combination of the two patterns to find the entire address "Hauptstr. 5 - 59969 Hallenberg". It should only match if the address is complete (the alphanumeric character in the example "-" should always be optional).

Nizar Belhiba
  • 93
  • 1
  • 10
  • For one use the not [^!@#$%&*]. For two : "[\w\.]+ \d+\s([^!@#$%&*]\s)?\d{5}\s\w+" – jdweng Apr 16 '21 at 11:46
  • 1
    In [Mannheim](https://www.google.de/maps/@49.48826,8.4674482,16z?hl=en), there are addresses like "B6, 26". Yes, that's a street address (used to be a building of the Uni campus). The inner city is designed as squares (think "chess board"). Just saying. Regex is _not_ the tool you want to use here. – Fildor Apr 16 '21 at 11:55
  • I'd do (and have done): 1. Grab a list of _all_ Streetnames in Germany (yes, you get it online - the internet is _great_) - then match all of them against your text, note candidate positions. 2. Grab a list of all valid postal codes - again search and note. 3. Grab a list of all valid community names - search and note position. 4. now you can run a set of rules on your data to find reasonable candidates. For example: A streetname should be followed by a number. Postalcode candidates should be followed by a communityname candidate _and_ they should match each other in the official list ... – Fildor Apr 16 '21 at 12:10
  • ... (account for typos!). Then street-candidates should be "in proximity" to zip/town candidates, i.e. right to left (with maybe a separation char or some few, even) or above ... of course it gets easier, if you search for _specific_ town/zip or street address. – Fildor Apr 16 '21 at 12:13
  • Have you seen: https://stackoverflow.com/questions/9863630/regex-for-splitting-a-german-address-into-its-parts ? – Christoph Lütjen Apr 16 '21 at 12:17
  • 2
    In general: You simply can't find addresses in text with regular expressions if you need high quality results. You dont' know where the city ends ("Hamburg", "Baden-Baden", "Rotenburg ob der Tauber", "Zell a. Hammersbach") and you don't know where the street starts. You can find text parts that may contain an address and continue processing them using known sets or you may have additional information (e.g. you know things like "Rechnungsadresse: address goes here..." or it's typically behind a company name... – Christoph Lütjen Apr 16 '21 at 12:27
  • If you have some knowledge about the input text itself, your matching can be more confident, of course. It makes a huge difference if you _know_ for a fact, that for example "on page 1, there _must_ be an address somewhere in the upper right quadrant" ... – Fildor Apr 16 '21 at 12:48
  • "Foo Str. 23a-42 - 12345 Rotenburg ob der Tauber Deutschland" Yeah, addresses are fun, even more fun if added from different sources/users! – nilsK Apr 16 '21 at 13:10
  • @Fildor Unfortunately there is no fix position for the Address it can be everywhere. – Nizar Belhiba Apr 16 '21 at 13:33
  • @jdweng Unfortunately didn't work @ Christoph Lütjen: thanks it's a good approach – Nizar Belhiba Apr 16 '21 at 13:43

0 Answers0