17

Hash-consing consists in keeping in memory only one copy of a given object ; that is, if two objects are semantically equal (same content) then they should be physically equal (same location in memory). The technique is usually implemented by keeping a global hash set and creating new objects only if they are not equal to an object in the hash set.

An additional requirement is that objects in the hash table should be collectable if they are not referenced by anything except the hash table; otherwise said, the hash table should contain weak references.

The issue is furthermore complicated by the need to have constant time, thus shallow, hashing and equality tests ; thus objects have a unique identifier that is incremented when a new object is added to the table.

I have a working implementation that uses System.Collections.Generic.Dictionary<key, node> where key is a tuple giving a shallow summary of the node (suitable for default hashing and equality test) and node is the object. The only problem is that the Dictionary keeps strong references to the nodes !

I could use a Dictionary to WeakReference's but this would not free the keys pointing to dangling references.

Some advocate using System.Runtime.CompilerServices.ConditionalWeakTable but this class seems to do the opposite : it frees the value when the key is collected, whereas I need to free the key when the value is collected.

One could try using System.Runtime.CompilerServices.ConditionalWeakTable<node, node> but I would need custom hashing and equality tests... and ConditionalWeakTable is documented not to use the GetHashCode() virtual method, instead using the default hashing function.

Thus my question : is there some equivalent of Dictionary that would keep weak references to values and free the keys when the references become dangling ?

David Monniaux
  • 1,948
  • 12
  • 23
  • Do you need to free the key immediately when the value is collected? Or could you relax the requirement and just free the key at some later point in time? – Jack P. Mar 25 '13 at 14:27
  • I do not need them to be freed immediately — it's just that I don't want them to accumulate and uselessly consume lots of memory. I've thought about running another thread to periodically kill keys with dangling references, but this seems complicated and prone to concurrency errors. – David Monniaux Mar 25 '13 at 14:34
  • For what it's worth, I also have an OCaml implementation using the hash table from the `Weak` module, and a Java implementation usiong `WeakHashMap`. – David Monniaux Mar 25 '13 at 15:32
  • 1
    Could you implement weak hashtables in F# using the OCaml code as a reference implementation? IIRC the weak hashset uses weak arrays, which could be implemented w/ Array. – fmr Mar 25 '13 at 18:55
  • @monniaux: Nice to see you here. After our brief talk last year, it's great to see you pick up F#. Let us know how it works for you :). – pad Mar 25 '13 at 20:36
  • 2
    It seems to be related: [Compacting a WeakReference Dictionary](http://stackoverflow.com/q/2047591/55209) – Artem Koshelev Mar 26 '13 at 05:25
  • 1
    Also, `DependentHandle` may help: [Ephemerons in .NET and C#](http://blog.gx.weltkante.de/2012/08/ephemerons-in-net-and-c.html) – Artem Koshelev Mar 26 '13 at 05:52
  • @Artem: I thought about running some kind of cleanup thread to kill keys mapped to dangling references. This seems complicated, and introduces threading (with all associated problems: proper locking, etc.). Maybe there is a way to synchronize it to the GC, but I don't know about .net GC internals (and perhaps they depend on the platform). – David Monniaux Mar 27 '13 at 13:22
  • There is probably some problem with this which I haven't considered, but have you tried using some form of the hash value of your objects as the key for the CWT? Then you would simply have to write a class to wrap the CWT which uses the hash to check if the object already exists. – N_A Mar 31 '13 at 03:35

1 Answers1

3

You are right that CWT does not solve the hash-consing problem because it begs the question - its keys assume reference equality. However, it might be worth pointing out that CWT does not hold on to keys or values. Here is a little test:

open System.Collections.Generic
open System.Runtime.CompilerServices

let big () =
    ref (Array.zeroCreate (1024 * 1024) : byte [])

let test1 () =
    let d = Dictionary(HashIdentity.Reference)
    for i in 1 .. 10000 do
        stdout.WriteLine(i)
        let big = big ()
        d.Add(big, big)
    d

let test2 () =
    let d = ConditionalWeakTable()
    for i in 1 .. 10000 do
        stdout.WriteLine(i)
        let big = big ()
        d.Add(big, big)
    d

On my machine, test1 runs out of memory and test2 succeeds. It seems that this would only happen if CWT did not hold on to keys as well as values.

For hash-consing, your best bet might be what Artem is suggesting in the comments. If this sounds too complicated, it also makes a lot of sense to just give the user control, say:

let f = MyFactory() // a dictionary with weak reference values hidden inside
f.Create(..) : MyObject // MyObject has no constructors of its own
f.Cleanup() // explicitly cleans up entries for collected keys 

Then you do not need to introduce threading, study how GC internals work, or do any magic. The user of the library may decide where it is appropriate to clean up or simply "forget" the factory object - which would collect the whole table.

t0yv0
  • 4,714
  • 19
  • 36
  • 1
    I tried using CWT but it appeared that data put inside the table was collected immediately (because the value is collected as soon as the key becomes unreachable). Have you tried recovering data from a CWT? It is impossible to use CWT from A to A because CWT does *not* use the hashcode function from the data type, but instead calls the default hash function, which is unsuitable for hash-consing (one needs shallow hashing with unique identifiers). One solution would be to copy the CWT source code and adapt it. – David Monniaux Mar 28 '13 at 09:44
  • @monniaux: yes, I agree that CWT is not suitable for hash consing. OCaml weak table clearly wins here. Recovering data from a CWT is fine though if you hold on to the keys - this is what it's been designed for. Yes, post here if you find a good solution or write your own - for hash-consing. – t0yv0 Mar 28 '13 at 12:06