2

We have production servers running under Debian Linux which host several busy Tomcat instances, database and support services. The systems ran stable for a couple of years, but lately we seem to experience slowdowns and ran into memory issues.

During this time the applications hosted by the Tomcats grew in size, more users, more Tomcat instances. It seems we start to run against the machine‘s memory limits.

I started to familiarize myself with memory monitoring using tools like htop and Java JMX trying to determine the present memory requirements. The identified knobs on the JVM side are switches like for setting heap space maximum and initial size. The memory monitoring parameters are virtual VIRT and reserved memory RES.

My problem is now to find out how much memory we need in the machines, as optimization efforts for the hosted applications might take a while to succeed.

Summing up all virtual sizes gives a multiple of the physical ram and is probably not a good number because the kernel might handle identical parts like common library code.

Summing up all reserved sizes should be close to actual memory usage, less shared memory usage. But it is the result of a dynamic process where memory allocations by the kernel and the different applications and stuff like order of starting the various Tomcat instances might play a role.

Before I now start a bisection trial and error approach, increasing RAM and measuring resulting system performance until we reach calmer waters, I posted this question in the hope that there might be a means to get a better estimate of RAM requirements.

Update:

$ cat /proc/meminfo
MemTotal:       66075980 kB
MemFree:         2117304 kB
Buffers:          396328 kB
Cached:          9286764 kB
SwapCached:       794700 kB
Active:         53198584 kB
Inactive:       10075240 kB
Active(anon):   50010632 kB
Inactive(anon):  3587764 kB
Active(file):    3187952 kB
Inactive(file):  6487476 kB
Unevictable:        5604 kB
Mlocked:            5604 kB
SwapTotal:       4194300 kB
SwapFree:            324 kB
Dirty:             49460 kB
Writeback:            72 kB
AnonPages:      52802056 kB
Mapped:            89356 kB
Shmem:              4448 kB
Slab:             388132 kB
SReclaimable:     324892 kB
SUnreclaim:        63240 kB
KernelStack:       11360 kB
PageTables:       126924 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    37232288 kB
Committed_AS:   47441088 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      386700 kB
VmallocChunk:   34325801336 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       93868 kB
DirectMap2M:     8259584 kB
DirectMap1G:    58720256 kB
$
mvw
  • 121
  • 7
  • Edit your question to add `/proc/meminfo` of a host under load. `Committed_AS` is what you need now to not page out. Add the rest of meminfo plus your JVM size specifics and you can start developing sizing formulas. – John Mahowald May 25 '19 at 13:22
  • Thanks John, I will have a closer look at that output. Is there any useful literature on this subject out there? – mvw May 26 '19 at 16:53
  • How did the memory issues manifest themselves and why do you think you have a memory issue? – Juraj Martinka May 27 '19 at 08:04
  • The OOM killer showed up. – mvw May 27 '19 at 10:21
  • @JohnMahowald I added the listing. Commited_AS is around 47G. Is that the excess of sum virtual memory vs physical memory? – mvw May 27 '19 at 13:45
  • `Comitted_AS` is the kernel's guess at how much memory it needs for all its allocations without paging out. 47GB on a 64GB host is a system that is not under much memory pressure. Perhaps also look at the syslog of an OOM event. – John Mahowald May 27 '19 at 13:59
  • Also include a sample host, how many JVMs and the heap tuning for each. Plus any databases or other apps on the hosts and their (shared) memory configuration. – John Mahowald May 27 '19 at 14:01

2 Answers2

2

As asktyagi mentioned, it may be that just run too many applications on your host. In general, running many JVMs on a single host is likely to cause all kinds of contentions for resources where memory is only one of them - another example is GC threads competing for CPU, disk IO, etc.

You mention you scale up by running multiple Tomcat processes. You may want to experiment how many processes is the right option for you - for this, a separate load test environment is probably essential.

To find out how much memory your program needs, proper monitoring is required. You can start experimenting on your local machine with a basic profiler like VisualVm, observing GC behavior and trying different -Xmx settings. You may also want to try different GC algorithm (e.g. Shenandoah) depending on your workload and how important latency/throughput requirements are.

On the cluster, you should turn GC logs on and perhaps enable low-overhead profiling via Java Flight Recorder. Later, you can use tools like jClarity's Censum to give you insights from the GC logs.

The important thing to understand is that you cannot just "guess" your app memory requirements by looking at the current memory consumption levels - JVM will try to consume as much memory as you give it so if it consumes 10 GB it doesn't have to mean it needs it. It may be quite satisfied with just 1 GB (and even more performant since GC pauses can be shorter).

As a side note, overcommit (manifested by the OOM killer) can be a bad thing (see http://www.etalabs.net/overcommit.html), especially for server machines - you may want to disable swap entirely.

Juraj Martinka
  • 495
  • 1
  • 3
  • 8
  • A single machine has 64G RAM and 4G swap configured. Why there is swap at all is puzzling me as well. It gets used and thereby slows down matters. I will try to get the OK to remove it, – mvw May 27 '19 at 13:01
  • JVM behavior is different than I had expected. E.g. one Tomcat instance has -Xmx4096m but htop displays RES 9565M (VIRT is at 14G). I interpret this that while heap is bounded to 4G, the system assigned 9G memory to it. What might that 5G difference be? – mvw May 27 '19 at 13:07
  • I just added the VIRT sizes of the major memory hogs. I get about 100G. So we are overcommited if I understand your linked article (thanks btw!), as that sum exceeds the 68G physical memory. – mvw May 27 '19 at 13:14
  • JVM doesn't just consume heap memory but there are also thread stacks, GC structures, JIT code cache, native memory allocations, etc. See this excellent answer (pretty deep): https://stackoverflow.com/questions/53451103/java-using-much-more-memory-than-heap-size-or-size-correctly-docker-memory-limi/53624438#53624438 – Juraj Martinka May 27 '19 at 20:00
  • btw. VIRT isn't a very useful metric (look here: https://serverfault.com/questions/138427/what-does-virtual-memory-size-in-top-mean). Focus on RSS and actual occupancy of the heap/non-heap memory as reported by JVM. – Juraj Martinka May 27 '19 at 20:04
1

Honestly it is not good idea to keep all things on single box, also it's difficult to calculate exact memory required reason is it's depend on hosted applications. I would recommend use LB in front of application and host your application in different different hosts.

If you still want to calculate memory, you need to get applications historical memory trend of applications based on thread count and traffic reports. It depend on other factors too such what are the other applications hosted on same box etc.

Hope this will help.

asktyagi
  • 2,860
  • 2
  • 8
  • 25
  • It is several boxes behind a load balancer already. We have also some monitored parameters. We probably do not interpret it correctly and have not all relevant parameters. – mvw May 26 '19 at 16:49