5

We are developing a large applications which are related to business. You can find these applications similar to some ERP, CRM, etc.

Now we have a requirement that we need all the data which are entered by the user to be versioned.

For example: at some point of time, the user would need to see what's the change history of a particular Purchase Order?

I am looking for a very generic versioning handler (not rigid), which could handle even cases if some some business data attributes gets changed. This single versioning handler should be able to work with almost any type of business objects/data.

What would be the best programming/database design to handle these.

Any Ideas or Opinions?

PS: I have added some programming tags as I want programmers to entertain this thread and give their ideas.

EDIT: I am looking for a very optimized way, somewhat similar to having diffs beings stored rather than storing the objects in a serialized/dumping way.

linuxeasy
  • 6,269
  • 7
  • 33
  • 40
  • I removed the generics tag. Please read generics tag description. – Saintali Oct 24 '11 at 15:14
  • The solution depends very much on how your objects are stored in the first place. Are you using a relational database? The other fundamental point is how are the versioned objects going to be used? Is this just a log or do you need to be able to present old states of the application in the same way as you do for the current one? – Nicola Musatti Oct 25 '11 at 09:17
  • @Nicola Musatti -> Yes this would be a relational database. Yes, we would require to be able to present the old states. And the objects are with its attributes, which can even be child objects further. – linuxeasy Oct 25 '11 at 09:40

5 Answers5

4

It may be just the right time to adopt purely functional lazy data structures.

In a nutshell, this requires banning any mutating operations on your Objects, i.e. making all your Object instances immutable. Then you redesign all the operations which change existing Object to creating new Object instance based on the old one.

For example, let you have an Order instance which contains a list of OrderItems and you need to add a specific OrderItem to that list. What you do in this case is creating new instance of Order by replacing its items list by a new list, which in turn is created by cons'ing the new OrderItem to the old list.

Let me illustrate that example further in pictures. Imagine a storage of objects (let it be RAM or relational database, anything):

Address | Object             | Created by
--------+--------------------+------------------------------------------
   1000 | list of OrderItems | empty list constructor
   1001 | Order              | Order constructor, uses address 1000
                  ...         
   1300 | OrderItem          | ...
   1501 | list of OrderItems | cons of addr 1300 to addr 1000
   1502 | Order              | replace order_items in addr 1001 by addr 1501

The very structure of storing data in this way is persistent (Chris Okasaki elaborates on this in his thesis, for example). You can restore any version of an Object by just following its creation history; versioning becomes trivial. Just remember the main point: don't mutate data, create new instances instead.

ulidtko
  • 14,740
  • 10
  • 56
  • 88
1

Something along the lines of Hibernate Envers - an Entity Auditing solution, appears to be a good fit for your requirements.

Biju Kunjummen
  • 49,138
  • 14
  • 112
  • 125
0

If you aren't doing invasive (I see generic as competing with invasive), then I think your option is to do full serialization. At each version you just take a snapshot of the object and serialize it to whatever is appropriate.

If you were confined to a database for instance, just save it with a timestamp and take the most recent timestamp as current. I would avoid thinking about it as solely a database problem though. You should be allowed to serialize outside of the system for things like partial results, testing, sanity, etc.

Tom Kerr
  • 10,444
  • 2
  • 30
  • 46
  • serializing is ok, but not a very optimized solution I feel. If it were something closer to having diffs of different version stored, would be interesting. – linuxeasy Oct 17 '11 at 14:44
  • @linuxeasy You'll spend more cycles puttering around trying to figure out what to store than you would just storing it. What would Knuth do? – Tom Kerr Oct 17 '11 at 16:00
  • We'll store the differences between the earlier business objects and the current business objects. We wont try to figure out what to store and what not, but we are looking towards developing a generic framework that will do this automatically. And therefore the question, what would be most effective generic algorithm. Storing diff also have several advantages such as highlighting the differences. So, though its bit complicated than rather dumping, it has its advantages too in terms of memory saved and highlighting differences. – linuxeasy Oct 18 '11 at 05:13
0

Basically you should be able to compare the fields of an existing object and the new object and persist the differences.

Typically this would be done for highly sensitive entities to keep track of the number of changes to that particular row/account. It would be an overkill if it is done for all the tables. The differences can be persisted asynchronously in a separate history table that matches the same design of the online tables.

r0ast3d
  • 2,639
  • 1
  • 14
  • 18
  • The equals method can take an argument and compare the fields for this purpose. http://en.wikibooks.org/wiki/Java_Programming/Comparing_Objects Before persisting if we can compare with the existing entity we can save the differences as the history separately. – r0ast3d Oct 20 '11 at 16:26
  • Ok, first thing, if you could specify as to what should be the best way to do it. Second, do consider complex objects such as an object consisting of child objects. For Eg: Purchase-Order is an Object and Purchase-Order-Items is the child Object. The Difference of the direct attributes of purchase-order can be made. What about the difference in purchase-order-items? these will have their own attributes. And there wont be just 1 but many purchase-order-items. What db structure would you consider the best to use it? Also going through the link, I do not feel it to be a very generic solution. – linuxeasy Oct 20 '11 at 16:45
  • Can not the purchase order items also be compared as part of the equals to determine the change? I would think that the purchase order items is part of the pruchase order entity. The child entity can also implement equals. Or for a more generic solution a simple reflection utility also can help to compare the fields and again attempt that recursively. – r0ast3d Oct 20 '11 at 17:21
  • Regarding DB design it could be a flat/horizontal table structure where only the meta information about what was changed is captured of every transaction. Or it can be a full fledged duplicate of the main tables, but please establish housekeeping on the history for table data size. – r0ast3d Oct 20 '11 at 17:21
  • Just another way to explain, I have come across such an implementation where banking account transactions have to be versioned, in that case the main transaction table has a duplicate table schema for history call it the history table. This would contain all the prior versions before an update is issued into the table. This could be done at the entity without much code whenever there is an update. The same would be for the child records as well. It would point to that purchase order history entity for the list of order items history. – r0ast3d Oct 25 '11 at 20:05
0

I've had to do something similar to this in the past.

I found that the easiest way, if starting from scratch was to add an extra column in the objects data table that held the index of the replacement object. If the value held was zero then I knew that it was the latest version. This also meant that objects were never deleted, although I later added functioanlity that would allow the deletion of the past history.

Effectively the index becomes the version number of the object in question and any query to the DB for the latest version would use the qualifier WHERE NextVersionIndex=0 .

If not starting from scratch this can be implemented into a live system by adding an extra table to store these values - Object Index, Previous version index, next version index.

ChrisBD
  • 9,104
  • 3
  • 22
  • 35