Nexus Repository OSS 3.4.0 Unusable due to memory leak

Question

We've encountered a sudden issue with our Nexus Repository Manager OSS 3.4.0 server running on a VM. The nexus UI, docker pulls/pushes all began timing out. Initial logs indicated a JVM OOM during garbage collection, which we attempted to address with a nexus service restart.

Following this, each restart of both the nexus service and the VM itself allowed a few minutes of normality until users started experiencing timeouts once again.

Each time, we observed errors from org.elasticsearch.monitor.jvm, implying there is a memory leak somewhere

-------------------------------------------------

Started Sonatype Nexus OSS 3.4.0-02

-------------------------------------------------
2023-02-11 05:43:43,849+0000 INFO  [jetty-main-1] *SYSTEM org.eclipse.jetty.server.ServerConnector - Started ServerConnector@222517ad{SSL,[ssl, http/1.1]}{0.0.0.0:8082}
2023-02-11 05:43:43,850+0000 INFO  [jetty-main-1] *SYSTEM org.eclipse.jetty.server.ServerConnector - Started ServerConnector@6aa47140{SSL,[ssl, http/1.1]}{0.0.0.0:8083}
2023-02-11 05:43:47,997+0000 INFO  [quartz-2-thread-2] *SYSTEM org.sonatype.nexus.quartz.internal.task.QuartzTaskInfo - Task 'Health Check: maven-central' [healthcheck] state change WAITING -> RUNNING
2023-02-11 05:43:49,377+0000 INFO  [quartz-2-thread-2] *SYSTEM org.sonatype.nexus.scheduling.internal.TaskSchedulerImpl - Task 'Health Check: maven-central' [healthcheck] scheduled: hourly
2023-02-11 05:43:49,381+0000 INFO  [quartz-2-thread-2] *SYSTEM org.sonatype.nexus.quartz.internal.task.QuartzTaskInfo - Task 'Health Check: maven-central' [healthcheck] state change RUNNING -> WAITING (OK)
2023-02-11 05:46:09,341+0000 INFO  [qtp1622059521-199] *UNKNOWN org.apache.shiro.session.mgt.AbstractValidatingSessionManager - Enabling session validation scheduler...
2023-02-11 05:46:09,353+0000 INFO  [qtp1622059521-199] *UNKNOWN org.sonatype.nexus.internal.security.anonymous.AnonymousManagerImpl - Loaded configuration: AnonymousConfiguration{enabled=true, userId='anonymous', realmName='NexusAuthenticatingRealm'}
2023-02-11 05:47:33,161+0000 INFO  [elasticsearch[12DEA86D-AA14EDFA-783CB331-3D44898F-42A9D480][scheduler][T#1]] *SYSTEM org.elasticsearch.monitor.jvm - [12DEA86D-AA14EDFA-783CB331-3D44898F-42A9D480] [gc][young][215][270] duration [708ms], collections [1]/[1.2s], total [708ms]/[1.9m], memory [8gb]->[7.8gb]/[12gb], all_pools {[young] [0b]->[4mb]/[0b]}{[survivor] [56mb]->[60mb]/[0b]}{[old] [7.9gb]->[7.7gb]/[12gb]}
2023-02-11 05:47:34,426+0000 INFO  [elasticsearch[12DEA86D-AA14EDFA-783CB331-3D44898F-42A9D480][scheduler][T#1]] *SYSTEM org.elasticsearch.monitor.jvm - [12DEA86D-AA14EDFA-783CB331-3D44898F-42A9D480] [gc][young][216][271] duration [727ms], collections [1]/[1.2s], total [727ms]/[1.9m], memory [7.8gb]->[7.7gb]/[12gb], all_pools {[young] [4mb]->[0b]/[0b]}{[survivor] [60mb]->[60mb]/[0b]}{[old] [7.7gb]->[7.6gb]/[12gb]}
2023-02-11 05:50:02,078+0000 WARN  [elasticsearch[12DEA86D-AA14EDFA-783CB331-3D44898F-42A9D480][scheduler][T#1]] *SYSTEM org.elasticsearch.monitor.jvm - [12DEA86D-AA14EDFA-783CB331-3D44898F-42A9D480] [gc][old][331][1] duration [29.1s], collections [1]/[30.9s], total [29.1s]/[29.1s], memory [11.4gb]->[9.1gb]/[12gb], all_pools {[young] [0b]->[4mb]/[0b]}{[survivor] [32mb]->[0b]/[0b]}{[old] [11.4gb]->[9.1gb]/[12gb]}
2023-02-11 05:50:02,092+0000 INFO  [quartz-2-thread-3] *SYSTEM org.sonatype.nexus.quartz.internal.task.QuartzTaskInfo - Task 'Storage facet cleanup' [repository.storage-facet-cleanup] state change WAITING -> RUNNING
2023-02-11 05:50:02,106+0000 INFO  [quartz-2-thread-3] *SYSTEM org.sonatype.nexus.quartz.internal.task.QuartzTaskInfo - Task 'Storage facet cleanup' [repository.storage-facet-cleanup] state change RUNNING -> WAITING (OK)
2023-02-11 05:51:13,155+0000 INFO  [elasticsearch[12DEA86D-AA14EDFA-783CB331-3D44898F-42A9D480][scheduler][T#1]] *SYSTEM org.elasticsearch.monitor.jvm - [12DEA86D-AA14EDFA-783CB331-3D44898F-42A9D480] [gc][young][400][466] duration [891ms], collections [1]/[1.3s], total [891ms]/[3.4m], memory [11.5gb]->[11.8gb]/[12gb], all_pools {[young] [128mb]->[0b]/[0b]}{[survivor] [48mb]->[16mb]/[0b]}{[old] [11.3gb]->[11.8gb]/[12gb]}
2023-02-11 05:51:46,711+0000 WARN  [elasticsearch[12DEA86D-AA14EDFA-783CB331-3D44898F-42A9D480][scheduler][T#1]] *SYSTEM org.elasticsearch.monitor.jvm - [12DEA86D-AA14EDFA-783CB331-3D44898F-42A9D480] [gc][old][401][2] duration [33.1s], collections [1]/[33.5s], total [33.1s]/[1m], memory [11.8gb]->[10.8gb]/[12gb], all_pools {[young] [0b]->[0b]/[0b]}{[survivor] [16mb]->[0b]/[0b]}{[old] [11.8gb]->[10.8gb]/[12gb]}
2023-02-11 05:52:04,488+0000 WARN  [elasticsearch[12DEA86D-AA14EDFA-783CB331-3D44898F-42A9D480][scheduler][T#1]] *SYSTEM org.elasticsearch.monitor.jvm - [12DEA86D-AA14EDFA-783CB331-3D44898F-42A9D480] [gc][young][418][486] duration [1.2s], collections [1]/[1.4s], total [1.2s]/[3.5m], memory [11.7gb]->[11.9gb]/[12gb], all_pools {[young] [340mb]->[0b]/[0b]}{[survivor] [40mb]->[4mb]/[0b]}{[old] [11.3gb]->[11.9gb]/[12gb]}

Given the further details are below, is there anything I can do to recover our Nexus server from this state? Thanks in advance

What we expect

Normal CPU/memory usage following a restart
Graceful shutdown of nexus during service nexus stop

What we've tried

Increasing both the host memory and JVM memory significantly to the following

-Xms12G
-Xmx12G
-XX:MaxDirectMemorySize=15G

Disabling all deployments from our CI pipelines and microservice clusters - there should be no load on Nexus
Cancelling / deleting 5 duplicated repository reindex tasks in the Nexus UI

What we observed

Consistent timeouts after a few minutes of normal usage - nexus memory is at 100%
Inability for nexus to gracefully shutdown - I've had to resort to kill -9 [PID] to restart the service
Memory
Elasticsearch GC is using all of the JVM heap - according to logs
Multiple repository reindexing tasks are stuck in in Cancelling (unsure if related, but is identical to the issue described here - https://issues.sonatype.org/browse/NEXUS-13121)

Have you tried upgrading? Version 3.4.0 is very old, there have been over 40 releases since that came out. — rseddon, Feb 13 '23 at 00:09

Nexus Repository OSS 3.4.0 Unusable due to memory leak

What we expect

What we've tried

What we observed

0 Answers0