As you observed, caches and generational collectors have opposing goals. More modern collectors like G1 and Shenandoah are region-based which can allow them to better handle old gen collection. In the Shenandoah tech talks, you'll often hear their developers discussing an LRU cache as a stress test. This might not be a problem if your GC is well tuned.
You may be able to keep the cache data structures on heap, but move its entries off. That can be done by serializing the value to a ByteBuffer
at the cost of access overhead. Another approach is offered by Apache Mnemonic which stores the object fields off-heap and transparently marshals the data. This avoids serialization costs but is invasive to the object model.
There are fully off-heap hash tables like Oak and caches like OHC. These move as much as possible outside of the GC, but there is a lot more overhead compared to an on-heap cache. This is comparable to using a remote cache, like memcached or redis, so that might be prefered. Memcached for instance uses slab allocation to very efficiently handle the memory churn.
Most often you'll see a small on-heap cache is used for fast local access of the most frequently used data that is backed by a large remote cache for everything else. If you truly need a multi-GB in-process cache, then off-heap might be necessary or you may have to tune your GC to accommodate this workload.