Ask your client if they care about storing accurate information1 rather than random strings.
Then set them a series of challenges. First, ensure that the sport information is in the correct "domain". For the member_sports
table, that is:
sport_id int not null
^
|--correct type
For their "store everything in a varchar column" solution, I guess you're writing a CHECK
constraint. A regex would probably help here but there's no native support for regex in SQL Server - so you're either bodging it or calling out to a CLR function to make sure that only actual int
values are stored.
Next, we not only want to make sure that the domain is correct but that the sports are actually defined in your system. For member_sports
, that's:
CONSTRAINT FK_Member_Sports_Sports FOREIGN KEY (Sport_ID) references Sports (Sport_ID)
For their "store everything in a varchar column" I guess this is going to be a far more complex CHECK
constraint using UDFs to query other tables. It's going to be messy and procedural. Plus if you want to prevent a row from being removed from sports
while it's still referenced by any member
, you're talking about a trigger on the sports
table that has to query every row in members
2`.
Finally, let's say that it's meaningless for the same sport to be recorded for a single member multiple times. For member_sports
, that is (if it's not the PK):
CONSTRAINT UQ_Member_Sports UNIQUE (Member_ID,Sport_ID)
For their "store everything in a varchar column" it's another horrifically procedural UDF called from a CHECK
constraint.
Even if the varchar
variant performed better (unlikely since you need to be ripping strings apart and T-SQL's string manipulation functions are notoriously weak (see above re: regex)) for certain values of "performs better", how do they propose that the data is meaningful and not nonsense?
Writing the procedural variants that can also cope with nonsense is an even more challenging endeavour.
In case it's not clear from the above - I am a big fan of Declarative Referential Integrity (DRI). Stating what you want versus focussing on mechanisms is a huge part of why SQL appeals to me. You construct the right DRI and know that your data is always correct (or, at least, as you expect it to be)
1"The application will always do this correctly" isn't a good answer. If you manage to build an application and related database in which nobody ever writes some direct SQL to fix something, I guess you'll be the first.
But in most circumstances, there's always more than one application, and even if the other application is a direct SQL client only employed by developers, you're already beyond being able to trust that the application will always act correctly. And bugs in applications are far more likely than bugs in SQL database engine's implementations of constraints, which have been tested far more times than any individual application's attempt to enforce constraints.
2Let alone the far more likely query - find all members who are associated with a particular sport. A second index on member_sports
makes this a trivial query3. No indexes help the "it's somewhere in this string" solution and you're looking at a table scan with no indexing opportunities.
3Any index that has sport_id
first should be able to satisfy such a query.