16

I am writing a web application, that is US specific, so the format that other countries use for postal codes are not important. I have a list of us zip codes that i am trying to load into a database table that includes the

  • 5 digit us zip code
  • latitude
  • longitude
  • usps classification code
  • state code
  • city

the zip code is the primary key as it is what i will be querying against. i started using a medium int 5 but that truncates the zip codes that have leading zeros.

i considered using a char5 but am concerned about the performance hit of indexing against a char variable.

so my question is what is the best mysql datatype to store zip codes as?

Note: i have seen it in several other questions related to zip codes. I am only interested in US 5 digit zip codes. So there is no need to take other countries postal code formats into consideration.

OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
gsueagle2008
  • 4,583
  • 10
  • 36
  • 46
  • 3
    What's the value add in keeping the leading zeros? So long as you display the leading zeroes (.PadLeft), I would keep your database optimized as far as possible. – JustLoren Sep 29 '09 at 17:30
  • 1
    The zeroes are important data in a zip code. Zip codes are *not* integers, even if they're made up entirely of numbers. – ceejayoz Sep 29 '09 at 17:32
  • thats a fair point. i did a few test cases querying for 00210 etc and it returned the proper record. i guess it is more of a gut feeling about concerns for data integrity. – gsueagle2008 Sep 29 '09 at 17:34
  • What about the newer Zip+4 codes? Ex: 91210-3889 – rlb.usa Sep 29 '09 at 17:34
  • @ceejayoz: So long as he displays zip codes with 5 digits, when will this ever pose a problem? In case the US expands the zip code system and integrates characters? – JustLoren Sep 29 '09 at 17:35
  • It poses a problem in having to remember to zero-pad it every time. It presents a problem in sorting. Ultimately, there are much better data types like `char(5)` to store this data with. – ceejayoz Sep 29 '09 at 17:36
  • 1
    @rlb.usa as far as i know the zip+4 format only provides a more specific location fix within that zip code, and being that i dont have that precise of dataset, and the fact that the precision of the current zip meets my requirements i am not worried about it. – gsueagle2008 Sep 29 '09 at 17:39
  • @ceejayoz, sorting a numeric always works the same as a zero padded number stored in a char, or am I missing something? – KM. Sep 29 '09 at 17:40
  • Does anyone know if there's a MySQL zipcode radius database that my client can purchase in order to do radial/locus searches? – Volomike Aug 05 '15 at 17:31

3 Answers3

30

char(5) is the correct way to go. String indexing is quite fast, particularly when it is such a small data set.

You are correct in that you should never use an integer for a zip code, since it isn't truly numeric data.

Edit to add: Check out this for good reasons why you don't use numbers for non-numerically important data: Is it a good idea to use an integer column for storing US ZIP codes in a database?

Community
  • 1
  • 1
Erich
  • 3,946
  • 1
  • 22
  • 21
  • 1
    Erich: Why not an integer? I would think that storing as an integer would help with type checking, you can add leading zeroes in the client, a character can be a non-digit ... Just wondering what point I'm missing. – John Sep 29 '09 at 17:34
  • The data set could be millions. Most likely there will be other addresses in the system where a zip is stored, and they should all be the same type – KM. Sep 29 '09 at 17:41
  • 2
    Zip Codes are not numeric data. Numeric data is data in which it makes sense to do mathematical operations on, which you would never do with a Zip Code. Otherwise, you are unfairly limiting your dataset. As far as the data set size, if the Zip code is truly the PK, there are only 99,999 values, a very small dataset relatively. – Erich Sep 29 '09 at 17:59
  • if you have 10 million customers with zip codes that is a big dataset – KM. Sep 29 '09 at 18:08
  • KM, 50 megabytes is not a big data set. An index with 10 million keys in it is relatively small in today's data management. – Walter Mitty Sep 29 '09 at 19:15
  • 2
    John: Integer should not be used, because the zipcode "00501" will be stored as "501" which may create problems. – Trevor Gehman Jun 19 '13 at 20:42
2

go with your medium INT(5) ZEROFILL, it should add the leading zeros for you. No need to impact the index and performance on a formatting issue.

KM.
  • 101,727
  • 34
  • 178
  • 212
  • 1
    as a char 5 it took .0007 seconds as medint5 zerofill it took .0006 seconds. i think both are valid solutions, but i think i am going to go with char5 and take the slight performance hit for a little more peace of mind on the data integrity front. – gsueagle2008 Sep 29 '09 at 17:57
1

If he makes it Char(6), then he can handle Canadian postal codes as well.

When you consider that there is a maximum of 100,000 5-digit Zip Code and how little space it would take up even if you made the entire table memory-resident, there's no reason not to.

David
  • 892
  • 4
  • 16
  • 41
  • 1
    I saw that from the original post. I just figured I'd mention it in case anyone else looked at this question in the future looking for advice but had a situation where foreign postal codes MIGHT make a difference. My main point was that, in the age of gigabyte memory sticks, a zip code table is pretty small. (I dealt with these when memory was measured in KILObytes) – David Sep 30 '09 at 12:09