99

In my MySQL InnoDB database, I have dirty zip code data that I want to clean up.

The clean zip code data is when I have all 5 digits for a zip code (e.g. "90210").

But for some reason, I noticed in my database that for zipcodes that start with a "0", the 0 has been dropped.

So "Holtsville, New York" with zipcode "00544" is stored in my database as "544"

and

"Dedham, MA" with zipcode "02026" is stored in my database as "2026".

What SQL can I run to front pad "0" to any zipcode that is not 5 digits in length? Meaning, if the zipcode is 3 digits in length, front pad "00". If the zipcode is 4 digits in length, front pad just "0".

UPDATE:

I just changed the zipcode to be datatype VARCHAR(5)

TeddyR
  • 1,213
  • 3
  • 12
  • 13
  • 3
    It seems the table column for zipcode is of type Number and that is causing the problem. In that case, you shall have to change the data type to keep character data. – Kangkan Jul 08 '10 at 05:00
  • 1
    @Kangkan, you're correct. My data type was a number. I just converted the zipcode to be varchar(5). Now, how to go frontpage < 5 digit zipcodes with a "0"? – TeddyR Jul 08 '10 at 05:07
  • 1
    It's better to use CHAR instead of VARCHAR. It'll speed up queries by a lot when the table gets big (only if all your other columns have fixed size though) – quantumSoup Jul 08 '10 at 05:15
  • 2
    Also consider postal codes from other countries are not always 5 chars. – Bill Karwin Jul 08 '10 at 07:09

8 Answers8

228

Store your zipcodes as CHAR(5) instead of a numeric type, or have your application pad it with zeroes when you load it from the DB. A way to do it with PHP using sprintf():

echo sprintf("%05d", 205); // prints 00205
echo sprintf("%05d", 1492); // prints 01492

Or you could have MySQL pad it for you with LPAD():

SELECT LPAD(zip, 5, '0') as zipcode FROM table;

Here's a way to update and pad all rows:

ALTER TABLE `table` CHANGE `zip` `zip` CHAR(5); #changes type
UPDATE table SET `zip`=LPAD(`zip`, 5, '0'); #pads everything
Pang
  • 9,564
  • 146
  • 81
  • 122
quantumSoup
  • 27,197
  • 9
  • 43
  • 57
  • I would like to actually clean up my data in the database itself. Do you know the equivalent to do this with SQL? – TeddyR Jul 08 '10 at 05:07
  • 1
    I ran the following code that made it work "UPDATE tablename SET zip = LPAD(zip, 5, '0');" – TeddyR Jul 08 '10 at 05:16
  • I would argue that this 'accepted' answer is not as good as the `ZEROFILL` answers. – Rick James Jul 14 '16 at 00:36
  • A flaw in this answer. If the default `CHARACTER SET` is utf8, that `CHAR(5)` will unnecessarily take 15 bytes! – Rick James Jul 14 '16 at 00:36
21

You need to decide the length of the zip code (which I believe should be 5 characters long). Then you need to tell MySQL to zero-fill the numbers.

Let's suppose your table is called mytable and the field in question is zipcode, type smallint. You need to issue the following query:

ALTER TABLE mytable CHANGE `zipcode` `zipcode`
    MEDIUMINT( 5 ) UNSIGNED ZEROFILL NOT NULL;

The advantage of this method is that it leaves your data intact, there's no need to use triggers during data insertion / updates, there's no need to use functions when you SELECT the data and that you can always remove the extra zeros or increase the field length should you change your mind.

zardilior
  • 2,810
  • 25
  • 30
Anax
  • 9,122
  • 5
  • 34
  • 68
  • 3
    Unsigned Zerofill is the way to go, although smallint maxes out at 65535. I'd suggest mediumint. Cali has zips of 9xxxx. – brandon-estrella-dev Nov 20 '14 at 19:42
  • 4
    If you ever want to support postal codes for other countries, you do not want an integer. Some countries use letters in their postal codes. – Wodin Apr 02 '15 at 12:28
12

Ok, so you've switched the column from Number to VARCHAR(5). Now you need to update the zipcode field to be left-padded. The SQL to do that would be:

UPDATE MyTable
SET ZipCode = LPAD( ZipCode, 5, '0' );

This will pad all values in the ZipCode column to 5 characters, adding '0's on the left.

Of course, now that you've got all of your old data fixed, you need to make sure that your any new data is also zero-padded. There are several schools of thought on the correct way to do that:

  • Handle it in the application's business logic. Advantages: database-independent solution, doesn't involve learning more about the database. Disadvantages: needs to be handled everywhere that writes to the database, in all applications.

  • Handle it with a stored procedure. Advantages: Stored procedures enforce business rules for all clients. Disadvantages: Stored procedures are more complicated than simple INSERT/UPDATE statements, and not as portable across databases. A bare INSERT/UPDATE can still insert non-zero-padded data.

  • Handle it with a trigger. Advantages: Will work for Stored Procedures and bare INSERT/UPDATE statements. Disadvantages: Least portable solution. Slowest solution. Triggers can be hard to get right.

In this case, I would handle it at the application level (if at all), and not the database level. After all, not all countries use a 5-digit Zipcode (not even the US -- our zipcodes are actually Zip+4+2: nnnnn-nnnn-nn) and some allow letters as well as digits. Better NOT to try and force a data format and to accept the occasional data error, than to prevent someone from entering the correct value, even though it's format isn't quite what you expected.

Craig Trader
  • 15,507
  • 6
  • 37
  • 55
5

I know this is well after the OP. One way you can go with that keeps the table storing the zipcode data as an unsigned INT but displayed with zeros is as follows.

select LPAD(cast(zipcode_int as char), 5, '0') as zipcode from table;

While this preserves the original data as INT and can save some space in storage you will be having the server perform the INT to CHAR conversion for you. This can be thrown into a view and the person who needs this data can be directed there vs the table itself.

lemming622
  • 131
  • 2
  • 12
3

It would still make sense to create your zip code field as a zerofilled unsigned integer field.

CREATE TABLE xxx ( zipcode INT(5) ZEROFILL UNSIGNED, ... )

That way mysql takes care of the padding for you.

Peter
  • 6,509
  • 4
  • 30
  • 34
3
CHAR(5)

or

MEDIUMINT (5) UNSIGNED ZEROFILL

The first takes 5 bytes per zip code.

The second takes only 3 bytes per zip code. The ZEROFILL option is necessary for zip codes with leading zeros.

Martin Sansone - MiOEE
  • 4,281
  • 1
  • 29
  • 31
3

you should use UNSIGNED ZEROFILL in your table structure.

Saurabh Chandra Patel
  • 12,712
  • 6
  • 88
  • 78
0

LPAD works with VARCHAR2 as it does not put spaces for left over bytes. LPAD changes leftover/null bytes to zeros on LHS SO datatype should be VARCHAR2

Agent Mahone
  • 307
  • 3
  • 15
  • 26