How to maintain two versions of same entity available for edit simultaneously with relationship?

Question

The problem

Consider these database tables:

Product
Order
Order Details
User

Product has columns:

Product_Name, Product_Description, Product_Size, Product_Cost, Product_Unit

Order has columns:

Order_number, Order_Total, Order_Status, Order_Payment_Status, Order_UserId (Fk of user table), Order_date

Order Details has columns:

OrderDetails_OrderId(Fk of Order table), OrderDetails_ProductId (Fk of Product table), OrderDetails_Quantity

User has columns:

User_Name, User_Phone (unique), User_Email (unique), User_Address

Consider, order statuses to be placed, packed, delivered, canceled, closed.

Now, There are three orders for the user u1:

order O1 -> Placed status (Editable by user)
order O2 -> Placed status (Editable by user)
order O3 -> Closed status (Non-editable by the user, but editable from admin)

Now the scenario is that the user u1 updates his information. This updated information should start reflecting in O1 and O2 only because they are still in the placed status; while O3 was already closed for the user and is now open only for admin's edit - so O3 should still reflect the old user information that was there previously. With the current database structure - this is not possible.

Similarly, if the admin edits product that was there in closed order, then the edits should not be displayed in the closed order.

As you may have figured out, the current structure depicted above is a simple foreign key related structure, wherein edit in one will obviously reflect directly to all related entities.

What solutions I figured out?

Solution 1: Versioning Never update any row/entry. Always add a new row for any change (Soft updates). Keep adding rows with some tag/id/timestamp/audit trail (who edited) and map the version with the order table using mapping table.

i.e.

User_Name | User_Phone | User_Email     | User_Address | Version/Timestamp
abc       | 123        | abc@email.com  |someaddres    | v1
abc       | 234        | abc@email.com  |someaddress   | v2

new mapping table
version | order_id 
v1      | o3

Drawbacks of this solution are:

Same table multiple entries, for the same entity - then we won't be able to use unique keys. Here, phone and email are unique, but if we pick this approach then unique indices have to be removed.
All those tables (unrelated to order) that would have the foreign keys of the User table will have an impact. For example, the user_feedback table has the user's foreign key only, but now since there are multiple entries with a different version of the same user, this table will be impacted needlessly.
When the number of users would increase, performance will be impacted in select queries.
User's email is the identity, which is used for logging in. Its duplication is anyhow not possible in the same table.

No this is not audit trailing! As per our requirement, the old information that we want to preserve for o3, should still be kept editable. Therefore, those edits shall also have to be audited. So audit trailing will be a separate wrapper altogether.

Solution 2: When order closes, create a new table with columns saving json/dump of all respective tables i.e.

new table
order_id | JsonOfUser         | JsonOfProductDetails | ...
o3       | {"name":"abc",...} | ...                  |

Drawbacks of this solution:

The things dumped are to be editable, but here the dumped data is difficult to edit because now the table has changed, and this table has a string/jsonb column that is effectively going to get edited, and other navigations are removed (denormalized) so all calculation changes that potentially would happen due to edits, would also have to be done manually.
Audit trailing of the edits in this table will be cumbersome because we'll be auditing the json edits here.
Deep level child jsons - increases code complexity.

Solution 3: Create copies of all tables with structure intact, that are related to order according to status events i.e.

User_Common

User_Closed

For order O3, upon closure, all details of user_common will be copied to User_closed, and order O3 that had the foreign key of User_common table will be changed with the foreign key of User_Closed table. Now any changes in the o3 will effectively be over old data and all other open/placed orders can still get the updated information from User_common.

Drawbacks of this solution:

Suppose there happen to be 10 such tables related to order with this requirement, then each table's copies will have to be made
Each entity is now effectively represented by two tables based on event/status of the order - syncing issues and data-keeping issues may happen - i.e. maintainability.
Foreign keys are changing here for the order table. So effectively in the order table, there would be two foreign key columns: one for user_common, and other for user_closed. So when the order is open, the user_closed foreign key will remain null, and when order closes, it will get filled. Before that, 1 data operation will still occur, one to copy the information on order closure from the user_common table to user_closed.
In code, we'll always have to make a DB check for whether the lookup should be made in a common table or closed table based on order status (another DB call) - leading to code-level cognitive complexity

This was a minimal dummy replication of our requirement and proposed solutions that are thought of in research. What is the practical possible design that can adhere to this requirement without adding needless complexities?

What needs to be remembered needs to be remembered. If you need data as of a certain time you need to record it. Also everyday & admim user actions should be audited, so even in the admin case there is data you should record. Versioning & auditing are big topics, we can't pick a policy for you & applying one is a bigger task than a SO question. (There are many SO Q&As .) Also there is no "best" in engineering unless you define it so that all would be able to evaluate a case the same & then you would have to say where you are stuck evaluating a case yourself & ask about that. — philipxy, Sep 25 '20 at 22:51
Just come up with a design that records what you want to know, and don't worry that there might be some sort of rules about how often or where you happen to record some sort of data that has something to do with some sort of thing or things that are or aren't somehow the same thing over time. That's done by following a published information modelling & DB design method. (There are certainly idioms for versioning & auditing--also published.) PS Vagueness like "won't be able to use unique keys", "unique indices have to be removed" & "is the identity" are unclear & seem misconceived. — philipxy, Sep 25 '20 at 23:04
I think u misunderstood the question. I already have tried those solutions & then I enlisted the drawbacks that I faced in each. I am not asking the "best" way out of those listed solutions, if you read carefully but I am asking exactly a valid question to provide _another_ solution/design/direction (because what I tried didn't succeed) that can be used. Also, I am not asking a broad topic here (again a misunderstanding), as I already have narrowed it down by mentioning the specifics of what I am up to. This question is framed after tons of research, practically trying out all these solutions — Karan Desai, Sep 26 '20 at 05:35
I didn't misunderstand the question. You seek another solution & my 2 comments are about choosing policies & generating designs in general. (I don't critique your designs, just aspects of your development & presentation of them.) Again: There is in general no "best" by which to decide, and in any case you don't define it, and you don't show yourself evaluating it. You give some desiderata but that doesn't make your question specific enough to determine one answer. Not all questions are on-topic. PS Maybe ask for feedback on [meta]. But research asking here & there. [ask] [help] [meta.se] — philipxy, Sep 26 '20 at 06:04
All these "solutions" are merely throwing complexity at a rather simple problem ...I mean, why should a user even be closed, to begin with? A truly logical business logic is required to produce a logical database structure. — Martin Zeitler, Nov 22 '20 at 06:32
Sorry to be unclear @MartinZeitler. "Order" needs to be closed with the details of the User that was there when "Order" closed. Suppose Order O of User U got closed, at that time User U's email was E, now when I change User U's email, it should not change the email for order O. In short, the updated or latest information of User should not be reflected in closed orders (because those orders were placed with different information set) — Karan Desai, Nov 22 '20 at 08:49
This question is being discussed in the [meta](https://meta.stackoverflow.com/q/403049) — TheMaster, Nov 23 '20 at 17:21
If it is important to keep track of any relational data at a point-in-time, the easiest way I have used is to make a flat table that captures all the relevant data at the point we want to capture. This is similar to your JSON approach, but uses columns so we can still avail ourselves of database indexing, and even other secondary joins if we aren't as worried about capturing something like a lookup code as point-in-time. I would not typically open this data to modification from the user end. — Chris Baker, Nov 24 '20 at 01:09
You'll find a flat history table cheaper in development/project time and cost -- storage is cheaper than processing a complex view or using relational history tables, and it's easier to maintain because it is more straightforward than a lot of searching around for what someone's address was as of a given date that an order shipped. That's conditional code, cases, maybe a procedure, vs a simple table that has the data you want, created at the moment you want to capture it. — Chris Baker, Nov 24 '20 at 01:11
Thanks for this insight @ChrisBaker. Yes it all boils down to decision-making I guess where the decision has to be made to whether or not block edit of JSON or saved point-in-time entries (even by admin) - this way we'll have the records available. But when there happens a strong requirement to make edits in those data, it adds additional complexities that comes both with time and cost. — Karan Desai, Nov 24 '20 at 04:11
-1 from me. IMO standard relational databases are just not well-suited for historical data. I think the solution lies at the application layer, not the database layer. Consider how Amazon or Aliexpress implement past orders (order is linked to snapshot of the product, separate logic allows user to transition from snapshot to current product or alternative). If Admin is god-like, let them edit absolutely anything, i.e. let admin even break the system. — Dima Tisnek, Nov 24 '20 at 09:07
Noted @DimaTisnek. Can you please point me to some non-relational databases that can be used. — Karan Desai, Nov 25 '20 at 13:01
@DimaTisnek Every practical business use of a relational/SQL DBMS maintains historical data. The use cases for noSQL vs relational have nothing to do with historical data. Neither does whether something should be dealt with in an application layer. — philipxy, Nov 26 '20 at 02:18

Patrick Artner · Answer 1 · 2020-11-22T10:25:38.680

Use the approach that has the User table add another column for a valid_since column. Apply the "never delete" strategy to Users.

If you measure performance issues add a persited/materialized (in memory) view for User that only shows the most current address - use that to get the user_id for newly placed orders and joins to show orders that are open. Joins for existing orders that use foreign keys into User mostly don't care about how many actual user_id's there are (simplified).

Use an after_insert trigger on User to propagate the new user_id to all entries of the Order to table that should reflect those changes, ignore closed orders for that update. This will be, in respect to the orders, a rather small update - how many open orders is one user allowed to have? 10? 20? 50?

Cleanup user data regularly in case they change but never order anything - those User entries can get erased.

You ensure integrity on database level that way - if you want add a report for users that change their details more then thrice a day (or limit those changes frontend wise).

Most of your user fields should also be 1:N relations - I have at least 3 telephone numbers and might use 2 addresses (living + shipping) and more then 1 email is a given.

Changing out those to own tables with a "active" might elivate the need to create full User copys. Based on buisiness needs the "address" used to ship stuff to might be worth remembering, not so much for the mobilenumber used to order or the email used to send confirmation towards but that is a business decision.

Unless you have a very active shop with millions of different users that order hundreds of orders each and change their details twenty times a month you wont get into problems using any of this with currently state of the art databse system. It seems to me this is rather a thought experiment then based in actual needs?

Thanks. No it isn't experiment, it's a practical use-case where a point-in-time data has to be saved (and locked for one set of (end) users) while that same data can be modified by other set of users (Admin, for post processing). My solution 1 is similar to your proposed one (Add a version or timestamp) and process data accordingly - but it has drawbacks that I have already mentioned. — Karan Desai, Nov 24 '20 at 04:13

How to maintain two versions of same entity available for edit simultaneously with relationship?

1 Answers1