1

Say you have an simple CRM system with customers and orders. If a customer changes his name, will you prefer that the old historical orders get the new name as well, or do you copy it to a string value on the order to preserve the accuracy?

I doubt there is an answer that covers all situations, but I'm struggling with deciding where I can accept the historical data to be changed and where I cannot accept it. Generally I'm using the thought that if it can be printed, it must be printable again later and therefore it should not change.

Happy to hear what you think...

(If this question belongs elsewhere, please point the way... )

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
rozon
  • 2,518
  • 4
  • 23
  • 37

1 Answers1

2

You can't allow historical data like that to be changed, especially on things that you might someday be required to print and turn over to a court of law.

Generally I'm using the thought that if it can be printed, it must be printable again later and therefore it should not change.

That's not a bad start. But some printed things are expected to change over time. Telephone books, criss-cross directories, billing addresses. You want to allow the columns that contain "current billing address" to change, but you don't want columns that contain "billing address at the time of an order" to change. You want "current price" to change over time, but "price at the time of an order" should not change over time.

In short, you need to know what each column means before you can decide whether to cascade updates to it.

If you allow "price at the time of an order" to change over time, you'll change the order totals over time--something I'm sure your accountants would throw a hissy fit about.

There are several ways to prevent changes to historical data.

  • Don't use foreign key references that cascade updates or deletes. This is virtually always a requirement. Whether you can use foreign key references at all is application-dependent.
  • Let the user-interface populate things like "billing address at the time of an order", maybe by selecting and copying from a table of customer billing addresses. This doesn't duplicate data, not in the relational sense anyway. Duplicate data means "the same values with the same meaning". When you copy the current billing address to the column "billing address at the time of an order", you're changing its meaning.
  • Timestamp your source tables. A table of "customer billing addresses" might, for example, also contain the columns that mean "use this billing address for orders placed on or after this date" (a start date) and "use this billing address for orders placed before this date" (an end date). You'd join orders to this table; the date of the order would determine which row was used at the time of the order (like, ...JOIN customer_billing_addresses c ON c.start_date <= orders.order_date AND orders.order_date < c.end_date...). You'll need careful design to guarantee there are no overlapping periods, and that deleting rows from customer billing addresses is tightly controlled. (That is, almost entirely prohibited.)

Whatever the approach, think hard about who should have permissions to change data in the historical tables. Once an order ships, that means just about nobody.

Mike Sherrill 'Cat Recall'
  • 91,602
  • 17
  • 122
  • 185
  • I very much agree with you. Do you know good ways to ensure the historical references, other than copying or making properties read-only? – rozon Nov 13 '11 at 15:26