Relational Databases: Current Data vs. Historical Data, best Practice

Question

Let's take a relational database, e.g. MySQL. To keep it simple, i will concentrate on the important things: Having a table that contains orders, with fields like order_id (primary key) order_date and a foreign key fk_supplier, which references a primary key in a table supplier. This table also has a field called supplier_name. Now, lets imagine, there is a php-website that is showing all orders that were made in a table. Each row of the table consists of the order_id, order_date and the supplier_name (the sql-statement made a join over the two tables). Everything is okay so far. Now, someone changes the name of one supplier that is referenced in one of the orders: the historical data becomes untrue or false. My question is: What is the best practice, to prevent this? Three solutions come to my mind:

Don't let the user change a supplier datarow, that is referenced in orders. Make him add a new supplier if the name changes.
Always save the current supplier data (e.g. the supplier name) with the order record and don't use primary key / foreign key references.
Introduce time-slices: every time, an important attribut of a supplier (like the name) is changed, create a new time slice. Reference not only the supplier_id in orders, but in addition the corresponding time-slice.

All this approaches have advantages and disadvantages. Point 2 e.g., seems pretty dirty and against all rules of a relational database. Point 3 would usually be the way to go, in my opinion. But needs a lot of effort, programming wise. The user experience / usability gets pretty bad, too.

I would like to hear, how experienced developers and database designers deal with that problem.

I would implement as option 2 but I fail to see (from your description) why you need to store obsolete (previous) supplier name in old order? IMO Supplier table should have `SupplierID INT` which is the primary key and name is irrelevant. — Alex, Nov 08 '17 at 12:36
I would consider a variation of option 3 where instead of time slices, you would simply create a new supplier and add an extra column to store `OriginalSupplierID` to be able to trace relationships. There are could be better solutions but need more background info. — Alex, Nov 08 '17 at 12:40
imagine that the supplier is named "ABC" when an order is made. Two weeks later, you change the suppliers name to "XYZ". Now, if you open the orders overview, the two weeks old order shows the NEW supplier name "XYZ", which is wrong. Considering you have printed out your order on paper, for archive purposes. On paper, die supplier name is "ABC", on the website, its "XYZ". — whocares81, Nov 08 '17 at 12:47
In this case my choice would be 2 BUT you still keep PK and FK to supplier record, while saving a snapshot of supplier details in another table e.g. OrderInvoiceDetails. — Alex, Nov 08 '17 at 12:55
As to `"Point 2 e.g., seems pretty dirty and against all rules of a relational database"`: No it isn't. You store the complete order so it can be reprinted. It is independent from current supplier name and address, so you store the data with the order. This is a valid option and doesn't violate database normalization. — Thorsten Kettner, Nov 08 '17 at 13:21
@ThorstenKettner but it leads to lots of redundancies, which isn´t really wanted. — whocares81, Nov 08 '17 at 13:26
Temporal data modeling is more involved than the 3 options above. I suggest reading a book like `Temporal data and the relational model`. — reaanb, Nov 08 '17 at 13:36
@whocares81: What I am saying is: this is not redundant. You store an order with the name and address. Another order for the same supplier can have another name and address. And the supplier can meanwhile have yet another name and address. What is redundant about this? It's a proper way to store the data. Same for the item prices in an order by the way. But there are alternatives, as you have already shown. All I am saying is: this doesn't violate normalization and is a valid option. — Thorsten Kettner, Nov 08 '17 at 13:50
@Thorsten Kettner: considering there are more fields in a table for suppliers than a PK and a name and considering changes to those fields are very unusual and there is a large amount of orders every day, there would of course be a high grade of duplicate data (maybe "redundant" is wrong in this context). But i got your point i think. — whocares81, Nov 08 '17 at 14:02
How is it that googling even your chosen title does not give you answers to this? — philipxy, Nov 10 '17 at 11:02

score 1 · Answer 1 · answered Nov 08 '17 at 13:28

Will expand on my comment:

I would choose Option 2 but with a few modifications:

Supplier table should stay as is.
Order table should keep referencing Supplier table
Create new table e.g. OrderInvoiceDetails which should have a 1 to 1 relationship to Order table and an FK to supplier. This table will contain a snapshot of supplier details.

Pros:

Easy and efficient querying
Invoice details can be modified separately from the Supplier table if need be.
I would pose that logically this is the best solution as you want to store supplier details in relation to a specific order as opposed to storing Supplier History data.
Old Data can be easily archived together with Orders data

Cons:

Stores redundant data, especially for suppliers whose details do not change often

DrHouseofSQL · Answer 2 · 2017-11-08T13:09:47.517

A form of Option 3 where you have a StartDate and EndDate in regards to the Supplier information. This way the data is accurate throughout all time (supplier name is correct as it was at the given times). One thing you could also do is create a spreadsheet the contents of which get loaded into the database every night that has the Supplier information (into a fact_Supplier table or lookup table). All edits to Suppliers go through this spreadsheet with access given only to those select people who would be in charge of that kind of thing. If there is a change in the spreadsheet the prior information in the Supplier table is end dated and a new record inserted with the new information. Any change will get this to happen, Supplier Name, Supplier address, etc.

Relational Databases: Current Data vs. Historical Data, best Practice

2 Answers2