1

I am debugging our server which hangs every couple of days.

java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)

Out of nowhere, everything freezes and the server stops processing/receiving any requests (or any work to be precise). After capturing the thread dump, I noticed a couple of things:

  • No deadlocks, according to jstack
  • 199 threads in BLOCKED state, 5 in IN_NATIVE state and 1 in IN_VM state
  • All of the blocking threads are blocked at wait/sleep/receive/etc methods

Only the thread that is in a IN_VM actually shows some significant stack:

Thread 10453: (state = IN_VM)
 - java.lang.System.identityHashCode(java.lang.Object) @bci=0 (Compiled frame; information may be imprecise)
 - java.io.ObjectOutputStream$HandleTable.hash(java.lang.Object) @bci=1, line=2360 (Compiled frame)
 - java.io.ObjectOutputStream$HandleTable.insert(java.lang.Object, int) @bci=2, line=2323 (Compiled frame)
 - java.io.ObjectOutputStream$HandleTable.growSpine() @bci=57, line=2338 (Compiled frame)
 - java.io.ObjectOutputStream$HandleTable.assign(java.lang.Object) @bci=28, line=2279 (Compiled frame)
 - java.io.ObjectOutputStream.writeOrdinaryObject(java.lang.Object, java.io.ObjectStreamClass, boolean) @bci=105, line=1428 (Compiled frame)
 - java.io.ObjectOutputStream.writeObject0(java.lang.Object, boolean) @bci=506, line=1178 (Compiled frame)
 - java.io.ObjectOutputStream.writeObject(java.lang.Object) @bci=16, line=348 (Compiled frame)
 - java.util.HashMap.internalWriteEntries(java.io.ObjectOutputStream) @bci=49, line=1793 (Compiled frame)
 - java.util.HashMap.writeObject(java.io.ObjectOutputStream) @bci=24, line=1363 (Compiled frame)
 - sun.reflect.GeneratedMethodAccessor13.invoke(java.lang.Object, java.lang.Object[]) @bci=40 (Compiled frame)
 - sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) @bci=6, line=43 (Compiled frame)
 - java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[]) @bci=56, line=498 (Compiled frame)
 - java.io.ObjectStreamClass.invokeWriteObject(java.lang.Object, java.io.ObjectOutputStream) @bci=24, line=1140 (Compiled frame)
 - java.io.ObjectOutputStream.writeSerialData(java.lang.Object, java.io.ObjectStreamClass) @bci=125, line=1497 (Compiled frame)
 - java.io.ObjectOutputStream.writeOrdinaryObject(java.lang.Object, java.io.ObjectStreamClass, boolean) @bci=137, line=1432 (Compiled frame)
 - java.io.ObjectOutputStream.writeObject0(java.lang.Object, boolean) @bci=506, line=1178 (Compiled frame)
 - java.io.ObjectOutputStream.defaultWriteFields(java.lang.Object, java.io.ObjectStreamClass) @bci=18, line=1523 (Compiled frame)
 - java.io.ObjectOutputStream.defaultWriteObject() @bci=41, line=441 (Compiled frame)
 - java.util.Collections$SynchronizedMap.writeObject(java.io.ObjectOutputStream) @bci=8, line=2691 (Compiled frame)
 - sun.reflect.GeneratedMethodAccessor12.invoke(java.lang.Object, java.lang.Object[]) @bci=40 (Compiled frame)
 - sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[]) @bci=6, line=43 (Compiled frame)
 - java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[]) @bci=56, line=498 (Compiled frame)
 - java.io.ObjectStreamClass.invokeWriteObject(java.lang.Object, java.io.ObjectOutputStream) @bci=24, line=1140 (Compiled frame)
 - java.io.ObjectOutputStream.writeSerialData(java.lang.Object, java.io.ObjectStreamClass) @bci=125, line=1497 (Compiled frame)
 - java.io.ObjectOutputStream.writeOrdinaryObject(java.lang.Object, java.io.ObjectStreamClass, boolean) @bci=137, line=1432 (Compiled frame)
 - java.io.ObjectOutputStream.writeObject0(java.lang.Object, boolean) @bci=506, line=1178 (Compiled frame)
 - java.io.ObjectStreamClass$FieldReflector.getPrimFieldValues(java.lang.Object, byte[]) @bci=313, line=2135 (Compiled frame)
 - java.io.ObjectStreamClass.getPrimFieldValues(java.lang.Object, byte[]) @bci=6, line=1389 (Compiled frame)
 - java.io.ObjectOutputStream.defaultWriteFields(java.lang.Object, java.io.ObjectStreamClass) @bci=70, line=1533 (Compiled frame)
 - java.io.ObjectOutputStream.writeSerialData(java.lang.Object, java.io.ObjectStreamClass) @bci=211, line=1509 (Compiled frame)
 - java.io.ObjectOutputStream.writeOrdinaryObject(java.lang.Object, java.io.ObjectStreamClass, boolean) @bci=137, line=1432 (Compiled frame)
 - java.io.ObjectOutputStream.writeObject0(java.lang.Object, boolean) @bci=506, line=1178 (Compiled frame)
 - java.io.ObjectOutputStream$BlockDataOutputStream.flush() @bci=1, line=1822 (Compiled frame)
 - java.io.ObjectOutputStream.flush() @bci=4, line=719 (Compiled frame)
 ....
 ....
 - few more frames where we invoke `flush()`

The object that we are attempting to serialize looks like this:

public class DomainObjectCollection implements Serializable
{
    private static final long serialVersionUID = -2050273337993827124L;

    private Map<String, DomainObject> objects;
    private Map<String, List<DomainObject>> objectsForUsername;

    public DomainObjectCollection()
    {
        this.objects = Collections.synchronizedMap(new HashMap<>());
        this.objectsForUsername = Collections.synchronizedMap(new HashMap<>());
    }
}

And we serialize the object rather simply:

os = new ObjectOutputStream(new BufferedOutputStream(new FileOutputStream(file), 128 * 1024));
                os.writeObject(object);
                os.flush();

So, apart from synchronization, nothing fancy.

Has anyone encountered something similar? If I missed providing some details please feel free to ask.

Update

About an hour ago from above thread dump, I captured the mixed stack trace:

----------------- 10453 -----------------
0x00007f0e5b65d709  __pthread_cond_timedwait + 0x129
0x00007f0e5a8bcbb7  _ZN18ObjectSynchronizer12FastHashCodeEP6ThreadP7oopDesc + 0x377
0x00007f0e5a5746df  JVM_IHashCode + 0x4f
0x00007f0e4577d13f  <Unknown compiled code>
Locked ownable synchronizers:
    - None
Jovan Perovic
  • 19,846
  • 5
  • 44
  • 85
  • What happens if you take 5 thread dumps with 5 seconds or so between each? It is hard to know if something has hanged by looking at a single point in time. – ewramner Sep 13 '21 at 15:14
  • Agree, completely. In this format, I captured only this occurrence, however, an hour before that I captured mixed stack traces of threads. Updating the question with it... – Jovan Perovic Sep 14 '21 at 07:26
  • Not a real solution to your problem, but you can implements writeObject and readObject per [Serializable](https://docs.oracle.com/javase/7/docs/api/java/io/Serializable.html) javadoc. You will also be able to optimize the serialized stream. As for your problem, I'd say to try removing the synchronizedMap in favor of mutable accessor (eg: ensure that all getObjects returns read only collection). – NoDataFound Sep 14 '21 at 07:34
  • Does the server stay frozen or will it resume work after some time? Are you sure that those 206 Threads are all Threads in your system? There are quite a lot more [states](https://www.generacodice.com/en/articolo/235579/Java-threads:-interpreting-thread-states-of-a-running-JVM) and depending on your system (or even the JVM itself) the JVM might max out its limit. Just a thought. – SirHawrk Sep 14 '21 at 09:14
  • @SirHawrk Yes, it stayed in that frozen state for nearly 5 days :-/ All of these threads legitimately run in our app - I can recognize the majority of them by their stack trace – Jovan Perovic Sep 14 '21 at 09:23
  • @JovanPerovic I still think you need multiple stack traces fairly close to each other (seconds) to determine if the JVM is really stuck in the call to identityHashCode. As that is a very well-tested and mature internal API it would be surprising, but it could happen. As NoDataFound I would also suggest replacing synchronizedMap and instead synchronizing accesses to the internal collections. If possible (scary in prod) you could also take a heap dump with jmap and see exactly what the program was doing. – ewramner Sep 14 '21 at 14:40
  • Those are all very good ideas. To be frank, the first thing I attempted is to capture the heap dump, which failed due to “WrongTypeException: no suitable match for the type of address”. @NoDataFound: I already made some changes to the code based on your suggestion. Just waiting to release it to the production. However, days may pass for me to be able to say that something was fixed :-( – Jovan Perovic Sep 15 '21 at 09:45
  • Well don't leave us hanging – SirHawrk Jan 05 '22 at 10:15

0 Answers0