12

I have an application that runs on Amazon EC2 (using several AWS products such as S3, DynamoDB, etc.) and has a memory leak. I collected some heap dumps and ran them through Eclipse's Memory Analyzer Tool, which pinpointed several hundred instances of sun.security.ssl.SSLSocketImpl (taking up dozens of MB of memory) as probable leaks.

I'm having trouble however figuring out why these SSLSocketImpl objects haven't been disposed of.

most instances of SSLSocketImpl in the dumps have two references, one from java.lang.ref.Finalizer, and one from com.amazonaws.internal.SdkSSLSocket. the Finalizer thread in my heap dump is reported as idle, with no objects awaiting finalization. but the com.amazonaws.internal.SdkSSLSocket objects that have references to the leaked SSLSocketImpl objects seem to have been cleaned up. at least, I can't find them in the heap dump (Dominator view in MAT).

I'm new to analyzing java heap dumps. what should I look for next? if the Amazon SdkSSLSocket objects have indeed been cleaned up, why weren't the SSLSocketImpl objects also cleaned up?

thanks!

ur-vogel
  • 414
  • 1
  • 5
  • 11

2 Answers2

3

This could possibly be because of not setting the SSL session cache size, which by default is infinite and that could end up consuming a huge amount of heap space. Setting javax.net.ssl.sessionCacheSize = 1000 should help.

3

I think I ran into this as well. After messing around with MAT a bit I was able to root cause what's going on, or at least prevent G1GC from doing a full GC on tenured space. Here's the mat diagram I was looking at:

enter image description here

My application is a simple Spring boot reactive http server which proxy's a reasonable volume of database read calls. I was running single CPU in docker container on K8s with about 4GB of mem. I doubled this to 8GB of mem and still saw the problem.

I didn't believe this was a memory leak because I could observe full GC's being able to cleanup tenured space it would just take a while.

What I believe JDBC is doing is as TCP connections are being Abandoned they get relegated to a thread called the AbandonedConnectionCleanupThread.

So what the solution turned out to be was just giving the JVM an extra CPU to service the AbandonedConnectionCleanupThread.

As soon as I did this my tenured space size never grew and g1_old_space would grow at about the size tenured space was, however my G1 is able to clean g1_old_space more effectively than tenured so long application pauses went away and G1 respected my -XX:MaxGCPauseMillis

Jordan Shaw
  • 539
  • 4
  • 9