It seems like the "_id" suffix is quite common in names of foreign keys. I wonder about the reasons behind this.
What are the benefits of posts(id, user_id, title, text)
over the simpler posts(id, user, title, text)
?
It seems like the "_id" suffix is quite common in names of foreign keys. I wonder about the reasons behind this.
What are the benefits of posts(id, user_id, title, text)
over the simpler posts(id, user, title, text)
?
The big benefit of following a naming convention like this: it helps makes SQL statements that are wrong "look wrong".
(This blog post from Joel Spolsky "Making Wrong Code Look Wrong" doesn't mention SQL, but I think the SQL naming convention that you asked about follows along the same principles that Joel espouses.)
http://www.joelonsoftware.com/articles/Wrong.html
The normative join is a foreign key to primary key.
When we follow the naming convention that uses id
as the primary key, and use tablename_id
(with _id
on the end of foreign key column name), that leads to SQL that looks like this:
FROM user
JOIN post
ON post.user_id = user.id
That's a very familiar pattern: surrogate primary key has a name of id
, and the foreign key columns get a name of <referencedtable>_id
, or sometimes <referencedtable>_<role>_id
.
Looking at this SQL, we expect that user_id
column in post
table is a foreign key reference to the id
column in the user
table. When we religiously follow this style naming convention, then SQL that doesn't follow that pattern just looks "wrong".
As an example, compare:
FROM somedoohickey
JOIN gibberish
ON troubador = minstrel
to:
FROM somedoohickey s
JOIN gibberish g
ON g.id = s.gibberish_id
Both may be equally valid SQL. But the second pattern conveys much more information to the reader. When we're used to the naming convention, the former just looks "wrong" to us, that is, we suspect that something might be amiss.
Compare that to:
FROM somedoohickey s
JOIN gibberish g
ON g.id = s.undecipherable_id
Something doesn't look right. Or even
FROM somedoohickey s
JOIN gibberish g
ON g.id = s.gibberish
Again, something just doesn't look right with that.
The biggest benefit of any naming convention is ease of learning. A maintainer who comes along later is going to have to learn what the data means, and some of that learning is going to be done subliminally, without even realizing it at the time. Having a foreign key tagged with an "Id" suffix helps the future investigator know at a glance which columns are foreign keys, and which are not.
Some shops use CamelCasing instead of underscore to make the suffix stand out from the stem.
Some shops include the Id suffix in the primary key column name. This has the advantage that a query on the metadata can locate all the primary keys that the foreign keys refer to. The same thing can be done by judicious use of named domains.
Some shops make it a point that all, or nearly all, foreign keys are declared as such in the data definitions, by including a reference constraint to prevent orphaned foreign keys. This has substantive consequences as well as aiding the investigator.
But the most important thing in making the system more learnable and therefore more accessible is consistency. If a convention is inconsistently adhered to, it can be more confusing that no convention at all.
This is very much opinion based, but I'd say it more accurately describes the contents of the column. 1902010
is not a user, it's a user ID. Calling it user
would be as wrong as calling your title
column title_id
.