0

Backstory: So I had this great idea, right? Sometimes you're collecting a massive amount of data, and you don't need to access all of it all the time, but you also may not need it after the program has finished, and you don't really want to muck around with database tables, etc. What if you had a library that would silently and automatically serialize objects to disk when you're not using them, and silently bring them back when you needed them? So I started writing a library; it has a number of collections like "DiskList" or "DiskMap" where you put your objects. They keep your objects via WeakReferences. While you're still using a given object, it has strong references to it, so it stays in memory. When you stop using it, the object is garbage collected, and just before that happens, the collection serializes it to disk (*). When you want the object again, you ask for it by index or key, like usual, and the collection deserializes it (or returns it from its inner cache, if it hasn't been GCd yet).

(*) See now, this is the sticking point. In order for this to work, I need to be able to be notified JUST BEFORE the object is GCd - after no other references to it exist (and therefore the object can no longer be modified), but before the object is wiped from memory. This is proving difficult. I thought briefly that using a ReferenceQueue would save me, but alas, it returns a Reference, whose referent has thus far always been null.

Is there a way, having been given an arbitrary object, to receive (via callback or queue, etc.) the object after it is ready to be garbage collected, but before it IS garbage collected?

I know (Object).finalize() can basically do that, but I'll have to deal with classes that don't belong to me, and whose finalize methods I can't legitimately override. I'd prefer not to go as arcane as custom classloaders, bytecode manipulation, or reflection, but I will if I have to.

(Also, if you know of existing libraries that do transparent disk caching, I'd look favorably on that, though my requirements on such a library would be fairly stringent.)

Erhannis
  • 4,256
  • 4
  • 34
  • 48
  • 1
    I hope you realise there’s no guarantee a given object will *ever* be garbage collected. – Bohemian Oct 26 '18 at 04:33
  • @Bohemian Sure. But that's fine in my use case, as I only really need to serialize things if I'm running low on memory, in which case I believe the garbage collector is more fastidious about garbage collection. If it's never GCd, I can just return the never-serialized object upon request. – Erhannis Oct 26 '18 at 04:35
  • I think you'd probably need to at least override the custom class loader, and very possibly need to modify your Java runtime, to support this. – xuq01 Oct 26 '18 at 04:36
  • search for Phantom Reference and final method, you will find your answer there. – akshaya pandey Oct 26 '18 at 04:37
  • Consider using a [WeakHashMap](https://docs.oracle.com/javase/10/docs/api/java/util/WeakHashMap.html). – Bohemian Oct 26 '18 at 04:50
  • @akshayapandey PhantomReference doesn't let you get the referent _at all_, and finalize is a method defined on the class getting garbage collected. As the question states, I don't have control of the class definitions. – Erhannis Oct 26 '18 at 06:46
  • @Bohemian WeakHashMap solves a different problem, I think - from my reading of the doc, it's kindof a memory-saving device (first two sentences of "This class is intended primarily...."). My use case isn't to drop objects I'm not using anymore, it's to serialize objects once they're not being used anywhere else. I still need the objects. Unless I've overlooked something important, WeakHashMap just silently discards entries whose keys are no longer referenced; it doesn't tell you what they were. – Erhannis Oct 26 '18 at 06:52

1 Answers1

1

You can look for a cache that supports "write behind caching" and tiering. Notable products would be EHCache, Hazelcast, Infinispan.

Or you can construct something by yourself with a cache and a time to idle expiry. Then, the cache access would be "the usage" of the object.

Is there a way, having been given an arbitrary object, to receive (via callback or queue, etc.) the object after it is ready to be garbage collected, but before it IS garbage collected?

This interferes heavily with garbage collection. Chances are high that it will bring down your application or whole system. What you want to do is to start disk I/O and potentially allocate additional objects, when the system is low or out of memory. If you manage it to work, you'll end up using more heap than before, since the heap must always be extended when the GC kicks in.

cruftex
  • 5,545
  • 2
  • 20
  • 36