1

I am working on a regex to parse full addresses by street number, street name, city, state, and zip code.

I came up with a pretty good regex that works for most cases, however, there are a couple scenarios where it fails. I need help with improving it. Here is what I have currently

Pattern pattern = Pattern.compile("^([\\d-]{0,}[\\s-]{0,}[\\d/]+)[\\s]{0,}");

This works fine if the street addresses are formed nicely where the address starts with a street number that has no letters attached to it. For example :

  • 123 Street Address, CA, 55555 works fine.
  • However 123 4th Street Address, CA, 55555 will result in :

      1234 => street number
      th Street => street name
    

I have done a lot of research on parsing addresses and this solution I have come up with is just about the simplest solution I've found. Just need a little more tweaks. Thanks in advance.

vinitius
  • 3,212
  • 2
  • 17
  • 22
portfoliobuilder
  • 7,556
  • 14
  • 76
  • 136
  • 5
    Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. – Dusan Jul 24 '15 at 21:38
  • In Europe we name streets like 'Britxon 2201' or '1945 victory 5' (5 is house number). Answering your question: you don't. – agilob Jul 24 '15 at 22:33
  • Because of the wide variance in address content and formatting, addresses aren't "regular"—an indispensable factor in using regular expressions to process information. This article elaborates: https://smartystreets.com/articles/regular-expressions-for-street-addresses – Neo Aug 01 '18 at 21:37

1 Answers1

1

You shouldn't break down all street addresses into one regular expression. You're better off handling street addresses with multiple regular expressions to cover a wide range of scenarios e.g.

  • 123 Stackoverflow Way
  • 5000 5th Avenue
  • 1 Hacker Way Building 5
Al Wang
  • 354
  • 2
  • 10