1

i have bunch of unformatted docs....

i need regex to capture street address, postal code, state, phone numbers, emails, such common formats...

Juha Syrjälä
  • 33,425
  • 31
  • 131
  • 183

2 Answers2

2

This site offers a searchable library of regexs: and this regular expression cookbook contains hundreds of examples of regex matching patterns

ennuikiller
  • 46,381
  • 14
  • 112
  • 137
0

In the case of street addresses and to a certain extent, postal codes, regexs can only go so far. As a matter of fact, trying to regex a street is essentially impossible because of the huge variety of formats for a street address--even from within the United States.

A regex that has worked rather well for strictly formatted US-based postal codes is: ^\d{5}([-+]?\d{4})?$

In the US, ZIP Codes are typically formatted as follows:

  • 12345
  • 123456789
  • 12345-6789
  • 12345+6789 12345-67ND (yes, you read that right, sometimes the last two can be "ND")

The other issue that you'll have is when a zero-prefixed ZIP such as one from New England has been run through Excel and it has removed the leading zero, leaving a four-digit number. This is why a regex alone can't get the job done 100% even for something as "simple" as a US-based ZIP Code.

Depending upon the business needs, you'll want to investigate an address verification solution. Any online provider worth their salt can standardize and verify and address which tells you if the address is real and can help reduce fraud and return shipping, etc.

In the interest of full disclosure, I'm the founder of SmartyStreets. We have an online address verification service which cleans, standardizes, and validates addresses. You're more than welcome to contact me personally for any questions you have.

Jonathan Oliver
  • 5,207
  • 32
  • 31