A table cain join either of two tables

Question

(I recognize this is likely a duplicate, but I cannot find any answers based on searching)

I have a table with some information which can either join on a user, or on a company. There are three ways I see to solve this problem:

Give the table a join_user_id column and a join_company_id column, and only fill in one of the two.
Make a join_id column, and a joins_with_table column
Duplicate the main table so that there are two versions, one with only a join_user_id, and one with only a join_company_id.

I'm ruling out 3 immediately because it's not dry. My question is, which of these is truly normalized design (or none of them)?

How do I handle the drawbacks of: 1. Leaving a bunch of empties, doesn't grow gracefully if a 3rd table comes, and how do I join succinctly? 2. How do I enforce foreign-key constraints? How do I join succinctly?

score 0 · Answer 1 · answered Jan 07 '13 at 19:16

Not claiming this is the best practice or even if it would work in your situation but a fourth option could be to keep the columns common to user and company tables in a separate table (maybe user_comp_base). The user and company details can then go into a separate user_details and company_details table with the same key as in the user_company_base table (which might optionally also contain a column telling which table contains details for this row).

Then you just need a join_id_user_comp_base in your table solving the problem of foreign key constraints and succinct joins. Ofcourse this comes at the cost of joining with another table (user_details OR comp_details) if the details are required.

Edit: I think this concept is called Table Inheritance.

also see http://stackoverflow.com/questions/13749525/relational-database-design-multiple-user-types/13752304#13752304 — Walter Mitty, Jan 08 '13 at 12:08

score 0 · Accepted Answer · answered Jan 07 '13 at 20:26

Neither 1 or 2 are third normal form, because setting a value in one column tells you something about the value in another column (which isn't the primary key). This breaks the property of being wholly dependent on the primary key necessary for 3NF.

I'd definitely rule out option 2, as creating foreign keys will be tricky/impossible and could cause confusion as to what the meaning of the value in the column is. If someone changes the joins_with_table column without changing the join_id column, you may not know!

To model this in 3NF, you'll need to create two new tables, one for users and one for companies. These will have original_table_id, user_id (or company), with original_table_id as the primary key. You'll then need to validate that there's only an entry in one table or the other.

Personally I'd still consider looking having these as extra columns in your original table however, unless you think it's likely that you'll be adding more tables you'll need to conditionally reference. You shouldn't design to avoid nulls so I wouldn't worry about this. Both this solution and the extra table approach require outer joins to the user and company tables if you need to return information about both companies and users in this table, so there's no real difference in query complexity.

score 0 · Answer 3 · edited May 23 '17 at 11:56

which of these is truly normalized design (or none of them)?

All of them are "normalized", but not all of them are equally useful.

1) Leaving a bunch of empties, doesn't grow gracefully.

NULLs are stored fairly efficiently by most DBMSes. Typically one byte, or even as little as just one bit on some DBMSes under ideal circumstances.

You can use a CHECK to ensure exactly one of them is non-NULL.

2) How do I enforce foreign-key constraints?

Theoretically, you could migrate joins_with_table down the FKs, and then have a CHECKs in referencing tables to ensure proper types. Obviously, this is awkward and wastes space (unless the DBMS supports calculated/virtual columns).

You could rely on triggers or (God forbid) application logic, but declarative constraints should be preferred whenever possible for their simplicity, robustness and speed.

I'm ruling out 3 immediately because it's not dry.

Unfortunately, relational databases often don't support concepts that we all are familiar with from the OO programming, and can suffer from some unnecessary repetition as a consequence. For example, table inheritance (where base table defines a "structure" and children define specific FKs) would be ideal to alleviate repetition in this case.

As it is, some repetition is not the worst thing in the world, as long as you don't let blow out of proportion.

How do I join succinctly?

I'm not sure I understand your question - please clarify. Depending on your needs, JOINs can be greatly influenced not just by the table design but by the key design as well.

All in all, I'd probably just go with (1), unless you anticipate other differences that could warrant (3).

A table cain join either of two tables

3 Answers3