I have several code/value/synonyms lists (i.a. ICD codes) that include several validity periods, aggregated into versions (one per year).
So now I may choose the fully normalized approach, with following structure:
VERSIONS(id INT PRIMARY KEY, name VARCHAR)
CODES(id INT PRIMARY KEY, code VARCHAR)
VALUES(id INT PRIMARY KEY, text VARCHAR)
CODEVALUES(code_id INT FOREIGN KEY CODES.id,
value_id INT FOREIGN KEY VALUES.id,
version_id INT FOREIGN KEY VERSIONS.id,
synonym_nr INT)
with PK(code_id, value_id, version_id)
This way I may have up to 14 records for a codevalue, that didn't change in last 14 years. For >14000 codes with up to 20 synonyms I end up with >2,000,000 records in CODEVALUES.
An alternative could be, to use an aggregated table, like
CODES(code VARCHAR, value VARCHAR, synonym_nr INT, min_version INT, max_version INT)
No FKs. For every combination of code/value/synonym_nr there's only one record.
I know about normalization, but I'm trying to reduce development- and administration-complexity since I need one OR/M entity for every SQL table, including it's relations and because I have dozens of such code lists and a factor 4 for class numbers is significant,
I wonder if there are performance differences between those alternatives.
UPDATE:
The queries on these lists are of the kind, that I look up a certain code with a specific version and want the default value for that code (synonym_nr = 0). As these queries are often part of larger queries, there may be several 10k up to 100k of such code lookups per query-transaction. With approach #1 I have at least 2 joins and the Db has to hold a mapping record for every version (redundancy for code/value). While approach #2 defines a valid version range, which a have to query via
WHERE version >= min_version AND version <= max_version
So it's joins and more records(index efficiency?) versus range comparisons in query constraints. Would there be a significant performance difference?