4

I am trying to build a database for a limousine company and I got stuck on how much Normalization should I do for Addresses related to Customers, Drivers, Affiliates and Orders.

Basically the Affiliate and Driver addresses look like this: address_line_1, address_line_2, city, state, zipcode, country

My problem comes from orders and customers addresses. They should look like this: address_line_1, address_line_2, city, state, zipcode, country, address_type_1 (home, business), address_type_2 (pick-up, drop-off - this only needs to be included for orders).

So between all the four tables I have similarities in address fields except for two fields which differ in customer and orders table.

I need to mention that every record will be identified with unique ids. Example:

Customer ID - 10,000 - 99,999

Order ID - 100,000 - no limit

Driver ID - a1 - a999 (maybe)

Affiliate ID - 1,000 - 9,999

These are just examples so don't spend to much time trying to understand them.

How many Addresses table should I use to create an good normalized database?

In this moment I have three ideas in my mind:

  1. One Addresses table with all the fields included plus an extra one describing the type of address (customer, order, affiliate, driver). Not really like this one.

  2. Two Addresses tables. One with drivers and affiliates, and one with customers and orders. For the second table I would have and field that will always be NULL for customers. Don't like this one too.

  3. Three Addresses tables. One for drivers and affiliates, one for customers, and one for orders. No unused fields leads me to think that this could be an better option than the other two.

Does anyone has an advice regarding these three options or maybe even a better option?

Thanks a lot.

UPDATE:

Don't bother yet about the numbering system for the tables ID. That was just an example. I still didn't had time to figure out the best numbering system. Will get to that once I got my addresses problem sorted out.

From Matt's answer I am tempted to leave the driver and affiliate tables with the addresses included and just sort out somehow the customer and order tables.

For customers, I would definitely need an Addresses table because a customer can have multiple addresses (home, business1, business2, favorite places, etc) that I want to have stored in their profile for easier access.

I forgot to mention something about the orders table which may change a little bit the equation of the problem. For any order I would need to have a PICK-UP and DROP-OFF location. But this can be either an address (street address) or an airport. This means that the fields related to a street address cannot match the airport specific fields. So I am pretty sure that to have four entities (pu_address, pu_airpot, do_address, do_airport) inside a table (all with their specific field) would leave me to unused space and with a programming mess. Ex: for pick-up fields: Address_type, Address_line_1, ..., state, country, Airport, Airline, Flt no, ... and for drop off same thing as pick-up.

So I still have a problem with the Order table for which I am not sure on how to move forward. I would need both addresses and airport pick-up and drop-off locations to be included with or without the use of extra tables.

UPDATE Thanks again Matt. First, yes I will store addresses in separate fields. The problem still remains for orders. I will give an example on what type of pu and do a limo service use. Address: 123 Main St, Chicago, Il, 60640; Airport: ORD, AA, 123. I need to have all those fields somehow integrated into the table.

Options: Order table

order_id, ..., pick-up fields which need to have both airports and addresses fields, drop-off fields with both airport and address fields.

This option still doesn't sound right.

Next would be to have two extra tables. One would be for addresses (including a field for recognizing pick-up or drop-off). The other one would be for airport (with a field for pu or do as well).

I don't like this option as well because I will need to do two queries in order to retrieve the information for only an order record. First I will retrieve the order information, and after I know the type of pick-up and drop-off (airport or address), I would do another query to retrieve the specific pick-up and drop-off information.

So, again... what am I doing wrong? Do I miss something?

And yes, I will definitely use some verification system to make sure that the addresses would be correct.

Matt
  • 22,721
  • 17
  • 71
  • 112
Cristian
  • 2,390
  • 6
  • 27
  • 40

2 Answers2

5

It's probably too late now, but I would suggest 1 Addresses table (address_id, address_line_1, address_line_2, city, state, zipcode, country, address_type (FK to AddressTypes table)) as this would follow the standard normalization rules. Your Orders table will have two Foreign Key relationships with the Addresses table - pickup_address_id and delivery_address_id. I have questions around the design of the Customers, Drivers and Affiliates tables, but without a better understanding of exactly how they relate it is difficult to prescribe a solution.

One option (but I don't know if it is the right one for you) would be to have a Parties table (party_id, party_type) which creates a supertype / subtype relationship (one to one-or-zero in each case) with Customers, Drivers and Affiliates, all of which are types of Party. I suggest reading one or two of David C. Hay's articles on data modelling for a better understanding.

keaukraine
  • 5,315
  • 29
  • 54
Andrew
  • 51
  • 1
  • 2
4

I actually work in the address verification industry with SmartyStreets, where processing and storing addresses is our area of expertise. In my experience I've seen a number of situations quite like yours.

I'm initially concerned with your segmenting ID numbers based on the type of record it is. If the four types of records (Customers, Drivers, Affiliates, Orders) are stored in different tables, why are the ID range limits needed? (Update: this isn't the main issue at hand really...)

Now, a bit about database design. Ideally, your design should reflect the operation of your core domain (that is, coordinating customers, orders, drivers, etc), without being coupled to merely the address data. While the addresses may be important, they're not the core operation of your business. On this ground and from what I've gathered from your original post, I would immediately hesitate to store the addresses separately from the actual record.

While you will have similar fields in each table, they represent different business purposes, and you won't risk unused, unnecessary fields. So the question isn't so much "how many address tables do I make," it's more a question of even making any tables for addresses only.

While addresses come in many shapes and forms, it's important for the limo company to have correct address information, and for your database to be normalized. The USPS (I assume you're US-based) certifies certain vendors to provide address normalization services. This is called CASS™ Certification. Run each address through a CASS™ service and you're done. The addresses will look the same, have complete information, and be deliverable, too. I suggest you start your search with something like LiveAddress, which will verify addresses at point-of-entry, or a CASS list scrubbing service, which will verify a batch of addresses at once (and warn you of duplicates).

UPDATE: In the case of several addresses a customer may have, then yes, I would advocate using a separate table for that. However, you'll still want to standardize/verify them with CASS so if needed, you can pull out duplicates later (plus you'll know the addresses actually exist).

So, except for that, consider storing each address inline with the actual record it associates with (not in separate tables).

For further questions or direction, I can personally assist.

UPDATE

About separating addresses from airports: that's potentially a valid distinction depending on your business needs, but remember that airports have addresses, too. You could add a field to your table to store the name of the firm or location the address points to, such as "O'Hare International Airport." This could consolidate a few of the fields. Also, I suggest you store the address in separate fields by component (Street, City, State, ZIP, etc).

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Matt
  • 22,721
  • 17
  • 71
  • 112
  • 1
    Matt- I would also add that based on how the addresses are used and the volume of data being used - it maybe best to split out the uncommon addresses to a different table. This may increase performance based on the page size of the records in the database - but this would for performance reasons only which may not apply here. – tsells Feb 11 '12 at 23:42