68

Running a simple Java program on our production machine, I noticed that this program eats up more 10G virt. I know that virtual memory is not that relevant, but at least I would like to understand why this is needed.

public class Main {
  public static void main(String[] args) {
        System.out.println("Hello World!");
        try {
                Thread.sleep(10000);
        } catch(InterruptedException e) {
                /* ignored */
        }
  }
}

Heres what top is saying when i run that little program:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
18764 myuser    20   0 10.2g  20m 8128 S  1.7  0.1   0:00.05 java

Does anyone know why this is happening?

uname -a says:

Linux m4fxhpsrm1dg 2.6.32-358.18.1.el6.x86_64 #1 SMP Fri Aug 2 17:04:38 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux

On an older 32bit-linux machine the same program consumes only about 1G virt. The old machine has 4GB RAM, the new one 32GB.

Volker Siegel
  • 3,277
  • 2
  • 24
  • 35
user3246431
  • 922
  • 1
  • 7
  • 12
  • 1
    @jwenting that's 1G of VIRTUAL memory, not necessarily physical memory. – CurtisHx Apr 29 '14 at 12:09
  • How much memory does your machine have? How much the older one? As others have pointed out, it's the JavaVM that takes a fixed amount of memory. – Marcus Bitzl Apr 29 '14 at 13:08
  • 8
    @CurtisHx: Think about it. One GIGAbyte of virtual memory! It is still insane, what are you doing with that much address space? – MSalters Apr 29 '14 at 14:15
  • Looking at a similar issue. You on Red Hat enterprise 6 by any chance? Try setting MALLOC_ARENA_MAX=4 as an environment variable and rerun your test. More info: https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/linux_glibc_2_10_rhel_6_malloc_may_show_excessive_virtual_memory_usage?lang=en – Brian Apr 29 '14 at 13:16
  • 9
    @MSalters Preallocating (without committing) space for the managed heap. This is 64-bit world, you can have as much virtual space as you want, it literally doesn't matter. – Cat Plus Plus Apr 29 '14 at 15:26
  • 3
    @CatPlusPlus: You need a few bytes of state per page (where is that page physically, in RAM, on disk, or not present?). Your Hello, World is literally using megabytes of page tables, none of which are needed. – MSalters Apr 29 '14 at 15:35
  • 9
    @MSalters it's the JVM, not the Hello World code which is pre-allocating all that address space. – NWard Apr 29 '14 at 15:43
  • 13
    I think this might be the first question I've ever seen that's probably more appropriate on StackOverflow than where it was posted... lol. – jpmc26 Apr 30 '14 at 00:10
  • 2
    @NWard: So? The JVM is making that allocation to run HelloWorld. I'm not trying to blame the author of HelloWorld, it's clearly a JVM failure. (Using exponential growth of the managed heap keeps the allocation cost at O(1) amortized, even if you'd start it at 1 kB instead of 1 GB) – MSalters Apr 30 '14 at 08:13
  • @MSalters: It allocates as a percentage of RAM. If you have enough RAM to justify 10GB of virtual address space, then "literally megabytes" of page tables is a drop in the ocean. If you have so little RAM that a few megabytes is at all significant, you'll be using less virtual, and thus need less for page tables. It's maybe not *ideal*, but it doesn't matter in any appreciable way. – Phoshi Apr 30 '14 at 08:44
  • @MSalters: it does not *use* that much memory. It only *reserves* it - very big difference. –  Apr 30 '14 at 10:46
  • 6
    @a_horse_with_no_name: My point is that _the reservations themselves_ add up to several megabytes. This is an issue for UNIX-style programming, where you may have thousands of small processes living side by side. That's no problem if the whole process is just a megabyte. – MSalters Apr 30 '14 at 10:53
  • Stahp! There is a lot of overhead for java and there are configurable minimum memory usages. Java is not assembler. – DwB Apr 30 '14 at 16:13
  • @MSalters - it isn't an issue because the *memory is not actually allocated*. All that's allocated is *address space*, which is *private to the individual process* and has (almost) no effect on other processes. No memory will be allocated to the address space unless something actually uses it. – Jules May 01 '14 at 09:43
  • @Jules: I know the 1 GB isn't actually allocated. But there will be approximately 4MB of metadata stating which parts of the 1GB are just reserved, in RAM, and/or on disk. (Assuming one 8 byte pointer plus status per 4kB page) – MSalters May 01 '14 at 09:49
  • 2
    That's not a valid assumption. There isn't a data item per page. All there is is an entry in a simple data structure (according to the paper at http://people.csail.mit.edu/nickolai/papers/clements-bonsai.pdf it´s a red-black tree) that notes the reservation and its length. It probably takes somewhere in the region of 32-64 bytes, and is the same amount of memory no matter how large the reservation is. – Jules May 01 '14 at 10:07
  • @MSalters You know, this is exactly the reason why high-level languages *are* desirable. There's very few people who understand the underlying software and hardware well enough to avoid more issues than they cause by slight misunderstandings. Not to mention that it means that the higher-level framework (JVM, CLR) can accomodate new situations - just like the 32-bit vs. 64-bit switch. A great example was at this year's Build, on "modern C++", showing binary search being *slower* than linear search (for large arrays of small items). Assumptions don't work well anymore, the system is too complex. – Luaan May 07 '14 at 07:18
  • @Luaan There are multiple well-known cases when binary search is slower than linear search (e.g. small arrays (sorted or unsorted); large, unsorted arrays which will only be searched once, etc.). Are you implying there's research to suggest linear search is better than binary search _in general_? If so, do you have a source link where I could read more? – Dan Bechard Jul 07 '16 at 13:04
  • @Dan No, not at all. If that were so, I wouldn't give examples. As referenced, Modern C++ talk on Build 2014 touches on some examples of this (https://channel9.msdn.com/Events/Build/2014/2-661). And in your own comment, you mention "large, unsorted arrays which will only be searched once", which would require the binary search to sort the array before doing the search - a linear search will be faster. But even on sorted arrays, there's things you need to take into account - for example, array bounds checking necessary for binary search, but not linear, branch prediction etc. – Luaan Jul 07 '16 at 14:08
  • @Dan Of course, keep in mind that this is still just a comment, and the space is limited. There's enough material for plenty of blog posts, or even a book, really. But while today's computers *pretend* to be simple, they aren't (I say "today's", but this has been the case since about 486, and it's only getting more so). You need to take care about memory layout, cache usage, branch prediction, instruction reordering etc., *even on the hardware level*. Then on the software level, you get plenty of other things to care about - e.g. the array bounds checking I mentioned. – Luaan Jul 07 '16 at 14:14

7 Answers7

87

The default sizes for initial heap and maximum heap are defined as a percentage of the machine's physical memory, of which a production server nowadays tends to have a whole lot.

You can choose both via the -Xms and -Xmx command line options.

Michael Borgwardt
  • 342,105
  • 78
  • 482
  • 720
  • 6
    Wait. *> maximum heap size: Smaller of 1/4th of the physical memory or 1GB* - so the default maximum heap is capped at 1 GB. Assuming this hasn't changed, are you saying the initial allocation is higher than the *maximum*? Because that sounds pretty broken. – Bob Apr 29 '14 at 15:38
  • 4
    @Bob Seems that this applies only to 32-bit Java and pre-1.6 64-bit, on the client VM. Java 1.6+ 64-bit on server VM happily reserves much more. – Luaan Apr 29 '14 at 16:14
  • 2
    @Bob: "Note: The boundaries and fractions given for the heap size are correct for Java SE 5.0. They are likely to be different in subsequent releases as computers get more powerful." – Michael Borgwardt Apr 29 '14 at 21:49
47

Virtual memory really doesn't matter to you.

The basic difference between 32-bit and 64-bit is that the address space in 64-bit is incredibly large. If 10 GiB looks like a lot to you, note that .NET on 64-bit can use TiBs of memory like this. Yet on 32-bit, .NET is much more conservative (and so is JVM) - the address space is 4 GiB total - that's not a lot.

But it's irrelevant - it doesn't matter. It's just a thing that greatly simplifies programming, and has no negative effect on the host OS whatsoever. It creates a continuous address space for the VM to use, which means that you don't have to fragment the heap (or worse, the stack, where it's more or less impossible - but those tend to be only a MiB or so) as you get to require more "real" memory. When you finally commit the virtual memory, it becomes slightly more real - at that point, it more or less has to be backed by some data storage - be it the page (swap) file or physical RAM.

The point is, the physical location of the memory isn't necessarily continuous, but that's done outside of your reach, and the mapping is generally very fast. On the other hand, having to, say, index an array, that's actually fragmented over 10 different virtual address memory blocks, that's (completely unnecessary) work.

So there you have it - virtual memory is almost free on 64-bit. The basic approach is "if it's there, use it". You're not limiting the other applications, and it saves you quite a bit of work if you do actually end up using it. But until that point comes, you've only got a reservation. It doesn't translate to any physical memory at all. You don't pay for the friends that might come tonight and sit at your table, but you still have the space for them to sit if they do come - and only when they finally come do you actually get "charged".

See this question for more information about the way Java behaves on different machines and with different versions: What is the default maximum heap size for Sun's JVM from Java SE 6? The maximum heap size also determines the amount of virtual memory reserved, because the heap has to be a continuous address space. If it weren't pre-reserved, it could happen that the heap could not expand to this maximum value, because someone else reserved a region of address space in the place the heap has to expand.

Community
  • 1
  • 1
Luaan
  • 62,244
  • 7
  • 97
  • 116
  • 9
    @gnat How so? `It creates a contiguous address space for the VM to use, which means that you don't have to fragment the heap` seems like a pretty clear reason to me, among others. And I've explained the difference between 32-bit and 64-bit environment. – Luaan Apr 29 '14 at 16:10
  • 12
    @gnat It does, by explaining that the premise of the question (that 10G of virtual address space is somehow bad or undesirable) is false, and that there is a benefit (protection against fragmentation). – Roman Starkov Apr 30 '14 at 02:01
  • 1
    @romkyns The question does _not_ presume that 10GB of virtual memory is bad. When you explain why it isn't bad, that's just preaching to the choir. The question is only asking why Java has 10GB as it's default value. – Navin May 06 '14 at 11:10
8

It turns out that on a modern computer architecture that uses virtual memory addressing (where the "memory space" an application sees does not actually relate to memory that's actually physically allocated), it really doesn't matter how much of this virtual "memory space" is given to an application upon startup. It doesn't mean that this much memory has been allocated by the system.

If an application sees a virtual address space 10GB large all it signals to the app is that it may use memory addresses up to 10GB if it wants. However, memory is not actually allocated in physical RAM until it is actually written to, and this is done on a page-by-page basis, where a page is a 4kB section of memory. The virtual address space, is just that - completely virtual until actually used.

Let's say an application is given 10GB of address space and it starts using some of it. As a "fresh" - previously untouched - page of this virtual memory is first written to, the system will, on a low level, "map" this virtual page to a section of physical memory, then write it. But that application itself does not have to worry about such details, it just acts as if it has full access to a virtual area of memory.

In the case of Java applications, it's not the application itself but Java that is allocated that address space, and Java simply requests a huge address space by default - the amount it requests is calculated relative to the physical memory size, but not because it has any need to be conservative, but just for practicality - an application is probably not going to want enough heap size to totally bring a server to its knees, so it's operating on the assumption it won't. As I said above this does not mean that this much is "allocated" or that the system has had to expend many resources doing so.

thomasrutter
  • 114,488
  • 30
  • 148
  • 167
7

It's not your program using up that memory, it's the Java VM reserving that memory, regardless of which program is loaded.

Pieter B
  • 1,874
  • 10
  • 22
  • 3
    No, it's saying it **may** use that amount, not (yet) using it. – Volker Siegel Apr 30 '14 at 01:46
  • 7
    Why does this incorrect answer have so many upvotes. The Java VM is **not** "using" that much memory. It has only allocated that much virtual address space. This has nothing to do with how much "memory" is "used". – thomasrutter Apr 30 '14 at 01:58
  • 1
    @thomasrutter corrected to say reserve instead of using. This answer is correct because the hello-world program doesn't do anything at all. It's the Java VM that does and the hello-world program gets executed by the VM. That distinction is important and doesn't get made in the question. – Pieter B Apr 30 '14 at 06:53
  • 5
    It's still not correct. It's not "reserving" any memory. It's presenting a virtual address space of a certain size. Memory is not "reserved" or "allocated" at this point. It really is "virtual" address space. It doesn't really reflect actual memory. – thomasrutter Apr 30 '14 at 07:37
  • 1
    Indeed, thomasrutter is correct: the key distinction isn't between allocated and reserved memory, but between memory and address space, which is a little more abstract. – Jules May 01 '14 at 10:14
3

Imagine you're in the document storage business. You have a small facility in the middle of the city that stores boxes of papers, and a much larger warehouse outside of town with 1000 times the space. Every box has a label on it identifying its contents.

The in-city facility is main memory. The warehouse is disk space.

A 10GB virtual memory allocation for a new process doesn't mean finding room for 10 billion boxes for a new customer. It means printing 10 billion labels for boxes with contiguous ID numbers on them.

Russell Borogove
  • 18,516
  • 4
  • 43
  • 50
3

This is not the amount of physical memory the application is actually using. A virtual memory used by all processes can be orders of magnitude more than the amount of physical RAM on the machine, without any obvious problems.

Audrius Meškauskas
  • 20,936
  • 12
  • 75
  • 93
2

Your program is NOT using so much memory. JVM / OS is reserving that memory i.e, the limit UPTO WHICH your program can use. Also, like one of the answer clearly mentions. 32 bit and 64 bit have got nothing to do with this. 32 bit means you can access upto 2^32 physical memory locations. and 64 bit means upto 2^64.

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
TheLostMind
  • 35,966
  • 12
  • 68
  • 104