Let's first put some things in correct order. A Card Table
shows were there might be incoming references into this region. It does sort of a "hash". For a single byte in the card table - there are many bytes in the old region. It's like saying that if (in theory) you have a card table that looks like this (a single card is marked as "dirty")
0 1 0 0 0 0
For that 1
(dirty card) there is a certain mapping in the old region that needs to be scanned also, when scanning the young regions.
0 1 0 .....
0 - 512 512-1024 ...........
So that dirty card corresponds to a certain portion (from 512 to 1024 bytes) in the old generation, that will be scanned also, as part of the young generation scanning.
G1 has regions and now you need to understand how CT
and RS
work in tandem. Let's say GC scans Region1
at this point in time, it takes everything that is alive from this region and copies into Region2
. At the same time Region2
has references to Region3
. If we use CT
, Region2
will be marked as "dirty" (by placing the specific card in the card table).
Next cycle wants to scan Region3
. How does Region3
knows if there are other regions that might point into it? And indeed there are such cases: Region2
has references into Region3
. Well, it could look into the CT
and check every single dirty card (and thus every region that these dirty cards corresponds to) and see if there are references from any of those regions in here. Think about it: in order to clear Region3
, G1
has to look at the entire CT
. In the worst case scenario, it should be scanning the entire heap - for a single region.
Thus: Remembered Sets
. These data structures are populated based on what CT
knows about, by an asynchronous thread. When Region2
is marked as dirty, an asynchronous thread will start computing its RS
. When Region3
needs to be scanned, its RS
will only have an entry with Region2
.
Thus, in order to scan a single region, G1
only needs to look into that particular RS
.