0

If objects are immutable it's very easy and efficient to make a deep copy of object - just copy memory pointer of that object.

It's also very easy and efficient to do deep equality check - just compare the pointers.

But what happens if data comes from the outer world and we need to check its identity?

Consider following example:

  • Application query data for a Post from DataBase, deserialize it into the immutable Post Object (Model) and cache it in memory.
  • After a some time Application query the same data again, and also deserialize it into the immutable Post Object.
  • Now, how we can check if Post has been changed? We can't just compare references of immutable objects to check identity. References will be different (because we deserialized the data twice) but the data itself still may be the same.

How to handle such situations?

Alex Craft
  • 13,598
  • 11
  • 69
  • 133

1 Answers1

1

Two approaches that may be workable, or may give you more ideas for future approaches:

  1. Keep one or more dictionaries of known immutable instances, and look up instances you construct to see if they are in the dictionary yet; if so, substitute the dictionary instance for the newly read-one. Note that unless the dictionary is used every time an object is constructed, reference *equality* can be used to expedite comparisons, but reference *inequality* cannot. It may be useful, however, to have objects cache their hash code since unequal items will *usually* have different hashes. Note as well that unless you use a "WeakDictionary" of some sort, you'll have to make sure to periodically clean out the dictionary (in practice, this means that unless you have a WeakDictionary, it will be hard to clean out unused items without at least occasionally cleaning out used ones).
  2. It may be possible to give each object a creation-time indicator (perhaps a static `Interlocked.Increment` counter) as well as a link to the oldest (first created) object which is known to be equal. When comparing objects, follow the chains of "older known-equal" objects. If the chains reach the same object, the original objects are equal. Otherwise, if the objects' hash codes match, test them for value equality. If they're equal, update the link on the newer to point to the older. If either starting object had a chain of more than one link, `Interlocked.CompareExchange` its link to point directly to the oldest known-equal object. Using this approach, comparisons between value-equal objects will cause them to form cliques; comparison between value-equal objects within a clique will be fast, and comparison between value-equal objects in different cliques will join the two cliques into one. Note also that it may be desirable to have consumers of the nodes hold wrapper objects rather than direct links to the nodes themselves; if that is done, the wrappers for all the nodes in a clique could share a reference to the same internal data item.

Approach #1 is probably a good one if a suitable dictionary type is available, but approach #2 could have some considerable advantages as well. The biggest annoyance with #2 is that it adds an extra layer of indirection to object accesses. Still, being able to have objects consolidate themselves quickly into cliques could be a major plus.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • Thanks, so the #1 approach is - instead of directly create object by deserializing it the object factory should be used (that will take care of detecting identical objects). – Alex Craft Nov 09 '12 at 15:25
  • You could use a factory, or you could deserialize and then look up each object in the interning dictionary. The former approach may be helpful if the serialized form isn't too bulky, you can have each object hold a reference to a copy in serialized form, and if the form of serialization makes it possible to read each object as a blob without having to parse it. In that circumstance, you may be able to determine that a previously-serialized object matches the serialized form of an existing one without having to deserialize it. Otherwise, deserializing things and then comparing them... – supercat Nov 12 '12 at 15:48
  • ...would allow one to separate the interning logic from the data-reading logic. – supercat Nov 12 '12 at 15:49