3

Why g1 needs both of these data structures ?

My understanding is:

  1. CT holds the information about references' actual location in old generation.
  2. RS is specific to each region, each region has one RS associated with it, it stores info about outside references which points to objects in this region. So while navigating RS, actual location of reference can be found using CT.

Why can't RS hold all of the information instead of using CT ?

Is it because otherwise there would be too much of duplicate data stored in different RSs ? For example - Young generation has regions A and B, objects in both these regions have same outside reference from old generation. In this case both RSs associated with region A and B will store this information which is redundant, is this explanation correct ?

user10916892
  • 825
  • 12
  • 33

2 Answers2

6

Let's first put some things in correct order. A Card Table shows were there might be incoming references into this region. It does sort of a "hash". For a single byte in the card table - there are many bytes in the old region. It's like saying that if (in theory) you have a card table that looks like this (a single card is marked as "dirty")

0 1 0 0 0 0

For that 1 (dirty card) there is a certain mapping in the old region that needs to be scanned also, when scanning the young regions.

      0          1      0     .....

    0 - 512   512-1024  ...........

So that dirty card corresponds to a certain portion (from 512 to 1024 bytes) in the old generation, that will be scanned also, as part of the young generation scanning.


G1 has regions and now you need to understand how CT and RS work in tandem. Let's say GC scans Region1 at this point in time, it takes everything that is alive from this region and copies into Region2. At the same time Region2 has references to Region3. If we use CT, Region2 will be marked as "dirty" (by placing the specific card in the card table).

Next cycle wants to scan Region3. How does Region3 knows if there are other regions that might point into it? And indeed there are such cases: Region2 has references into Region3. Well, it could look into the CT and check every single dirty card (and thus every region that these dirty cards corresponds to) and see if there are references from any of those regions in here. Think about it: in order to clear Region3, G1 has to look at the entire CT. In the worst case scenario, it should be scanning the entire heap - for a single region.

Thus: Remembered Sets. These data structures are populated based on what CT knows about, by an asynchronous thread. When Region2 is marked as dirty, an asynchronous thread will start computing its RS. When Region3 needs to be scanned, its RS will only have an entry with Region2.

Thus, in order to scan a single region, G1 only needs to look into that particular RS.

Eugene
  • 117,005
  • 15
  • 201
  • 306
  • Thanks, this is kind of what i was looking for. Quick question - So when does RS of Region3 gets updated/populated, is it when Region2 is marked dirty ? – user10916892 Dec 29 '20 at 11:33
  • @user10916892 I do not remember that, and this might have changed in recent versions. What is important, is that it does. – Eugene Dec 29 '20 at 13:03
0

Let's look at the below picture: enter image description here

During the young GC, the E and G objects are not reachable from GC Roots. So they might be remarked as 'dead' objects. If that happened, it will cause unpredictable problem.

Considering scanning the whole heap is too time-consuming, JVM use rememberSet (RSet) to store all the locations which has a pointers to objects within this region. The follow-up image show a example RSet of Region 5. enter image description here

There're different implemtations dependig on various granularity of the RSet. For simplicity, if the RSet just record the reference Region index. RSet of Region 5 = {Region7, Region3}.

To check object liveness, GC is still need to scan two entire Region. The easiest optimization should be that GC divide a Region into serveral small areas/pages. Every areas has a flag in card table(CT). Assuming the area contains objects pointer to outside region, the corresponding flag is marked as dirty value.

So card table is used to indicate whether it contaion a pointer from old generation to young generation. It's particular type of RSet.

dknight
  • 619
  • 6
  • 7