Rationale for Soft-/Weak-/PhantomReferences clearing references to objects which have reference to tracked object

Question

The documentation for Soft-, Weak- and PhantomReferences all include a line simiar to the following (taken from PhantomReference):

At that time it will atomically clear all phantom references to that object and all phantom references to any other phantom-reachable objects from which that object is reachable.

The part which is confusing me is the one about the other phantom-reachable objects.

If I understand it correctly this describes this case:
Objects:

A
B

References:

->: Strong reference
-P->: Phantom reference

-> A
-P-> B -> A

So for some reason the garbage collector has not determined yet that B is only phantom-reachable. Now if A becomes phantom-reachable and the garbage collector detects this, it is required (according to the doc quoted above) to also clear the reference to B.

Is there any reason why the documentation requires this? It appears if other vendors were to develop a JVM this would be rather a burden.

Similar, but not the same question: https://stackoverflow.com/q/27463048 — Marcono1234, Jun 16 '19 at 16:18
It sounds to me like the garbage collector is unwilling to collect an object which has any hard references to it, even if those references exist only in an object which has only weak references to it. If the only hard references happen to be from objects that have only soft references to them, those objects will be collected first. In your example, when A is “ready” to be collected there is a hard reference to it from B, but no hard references to B exist so it can be collected first. — Ivan G., Jun 16 '19 at 17:26
@IvanG., I don't think it matters whether a strong reference exists in the chain. The chain has to consist only of strong references for an object to be considered strongly-reachable. The case described in the question would also apply if `B -P-> A`, since the doc only says "reachable", which includes phantom-reachable as well. — Marcono1234, Jun 16 '19 at 18:29
In the example, the phantom reference (p-ref) of B is cleared in any case, regardless of whether A is strong or phantom reachable (p-reach), simply because B is p-reach in both cases. The GC can do this in two ways: It iterates over all p-reach objects and 1. clears all direct and indirect p-refs (this is the procedure described in the doc) or 2. clears only all direct p-refs. Since the indirect p-refs of an object are always the direct p-refs of other objects (in the example, the direct p-ref of B is the indirect p-ref of A) both ways provide the same result. In what do you see the burden? — Topaco, Jun 18 '19 at 07:18
@Topaco, I am not arguing that the phantom reference to B will not be cleared. But what I see as burden is that it is required that indirect references are cleared *at the same moment* as well. So if a GC implementation detects that an object is phantom-reachable it has to perform additional work to find all not yet cleared indirect references (in this case the reference to B). — Marcono1234, Jun 20 '19 at 16:21

Holger · Accepted Answer · 2019-06-20T13:45:32.763

We first have to note, that this sentence has been copied from the documentation for soft and weak references to the documentation for phantom references for Java 9, to accommodate changes made in that version, but is not a good fit for phantom references, so the rationale behind it is better explained for soft and weak references.

Suppose you have the following situation:

(weak)→ A
(weak)→ B (strong)→ A

technically, both A and B are weakly reachable, but we can change this be invoking the get() method on either weak reference, to retrieve a strong reference to its referent.

When we do this on the first weak reference, to retrieve a strong reference to A, the object B will stay weakly reachable, but when we do this to get a strong reference to B, the object A will also become strongly reachable, due to the strong reference from B to A.

Therefore, we have the rule that if the weak reference to A gets cleared, the weak reference to B must be cleared to, as otherwise, it would be possible to retrieve a strong reference to A via B despite the weak reference to A has been cleared. And to be on the safe side, it must happen atomically, so there’s no possible race condition allowing to retrieve a reference to B between the clearance of the two references.

As said, this is of lesser relevance for phantom references, as those do not allow to retrieve the reference, but there is no reason to treat them differently.

The point here is, that this is not an actual burden, given how garbage collectors actually work. They have to traverse all live references, i.e. strongly reachable objects, and everything not encountered, is garbage per elimination. So when encountering a weak reference during a traversal, it won’t traverse the referent, but remember the reference object. Once it completed the traversal, it will run through all encountered reference objects and see whether the referent has been marked as reachable through a different path. If not, the reference object is cleared and linked for enqueuing.

To address your example:

(strong)→ A
(weak)→ B (strong)→ A

Here, B is weakly reachable regardless of the strong reference to A. When you eliminate the strong reference to A, B still is weakly reachable and may get enqueued. Formally, A is now weakly reachable, but the JVM will never detect that without detecting that B is weakly reachable too. The only way to detect that A is weakly reachable, would be by traversing the reference graph starting at the weakly reachable B. But no implementation does this. The garbage collector will simply clear the weak reference to B and that’s it.

Rationale for Soft-/Weak-/PhantomReferences clearing references to objects which have reference to tracked object

1 Answers1

Linked