19

What are the reasons a malloc() would fail, especially in 64 bit?

My specific problem is trying to malloc a huge 10GB chunk of RAM on a 64 bit system. The machine has 12GB of RAM, and 32 GB of swap. Yes, the malloc is extreme, but why would it be a problem? This is in Windows XP64 with both Intel and MSFT compilers. The malloc sometimes succeeds, sometimes doesn't, about 50%. 8GB mallocs always work, 20GB mallocs always fail. If a malloc fails, repeated requests won't work, unless I quit the process and start a fresh process again (which will then have the 50% shot at success). No other big apps are running. It happens even immediately after a fresh reboot.

I could imagine a malloc failing in 32 bit if you have used up the 32 (or 31) bits of address space available, such that there's no address range large enough to assign to your request.

I could also imagine malloc failing if you have used up your physical RAM and your hard drive swap space. This isn't the case for me.

But why else could a malloc fail? I can't think of other reasons.

I'm more interested in the general malloc question than my specific example, which I'll likely replace with memory mapped files anyway. The failed malloc() is just more of a puzzle than anything else... that desire to understand your tools and not be surprised by the fundamentals.

Tim Lovell-Smith
  • 15,310
  • 14
  • 76
  • 93
SPWorley
  • 11,550
  • 9
  • 43
  • 63

9 Answers9

8

malloc tries to allocate a contiguous memory range, and this will initially be in real memory simply due to how swap memory works (at least as far as I remember). It could easily be that your OS sometimes can't find a contiguous block of 10gb of memory and still leave all the processes that require real memory in RAM at the same time (at which point your malloc will fail).

Do you actually require 10gb of contiguous memory, or would you be able to wrap a storage class/struct around several smaller blocks and use your memory in chunks instead? This relaxes the huge contiguous requirement and should also allow your program to use the swap file for less used chunks.

workmad3
  • 25,101
  • 4
  • 35
  • 56
  • Beat me to it by a fraction of a second ;) Quite right about breaking mallocs down into smaller segments, 10GB is a bit ahead of current mainstrean PCs. – SmacL May 07 '09 at 07:23
  • 2
    blank is absolutely correct.. the address space in 64 bit is much, much, much bigger than a mere 10 GB! (Note that there may be a hidden 2^48 address size limit, which is still way bigger than 10GB !=2^33) – SPWorley May 07 '09 at 12:25
  • 1
    The address *space* wouldn't have any issues. But the available memory won't necesarilly fill the address space. AFAIK, memory can't be initially allocated in virtual memory, so if there isn't a way to page out the physical memory to have a 10gb contiguous block, the malloc will fail. Even if this wasn't the case there's only addressable memory in 12gb and 32gb, not the full 64 bit space available. – workmad3 May 07 '09 at 12:40
  • @work, Any reference to this restriction that mallocs need to be less than the currently unused physical memory size? That does sound like it'd answer the question, but what I don't understand is why such a restriction would exist. Wouldn't that kill many regular (reasonable) mallocs of say 100MB if your OS just happened to be keeping a lot of throwaway file cache? – SPWorley May 07 '09 at 13:51
  • @workmad, this could well be the issue. If the virtual space isn't mapped in such away that its occurs directly after the end of the physical heap, you wouldn't be able to allocate a block that spans the gap. As you say, I'd guess that memory can't be initially virtual; it has to be real and subsequently swapped. – SmacL May 07 '09 at 13:58
  • At work now, just tried in Linux 32 and 64 and I can successfully malloc more than physical RAM. Perhaps it's a Windows heap library limitation? Quick test, can someone with a 1GB or less Windows machine try a single 1.1GB malloc? – SPWorley May 07 '09 at 14:12
  • 1
    For linux, you have to be careful about memory overcommit [ http://linux-mm.org/OverCommitAccounting ] when using huge mallocs. – Steve Schnepp May 07 '09 at 14:37
  • @Arno, this is mainly speculation based on your observed behaviour. I'm also not suggesting that the malloc needs to be less than the current unused physical memory, but less than the amount of contiguous memory the OS can get or free up in physical memory at the time. It may be just a Windows fault as well as memory management is normally seen as superior on *N*X systems. – workmad3 May 07 '09 at 14:38
6

Have you tried using VirtualAlloc() and VirtualFree() directly? This may help isolate the problem.

  • You'll be bypassing the C runtime heap and the NT heap.
  • You can reserve virtual address space and then commit it. This will tell you which operation fails.

If the virtual address space reservation fails (even though it shouldn't, judging from what you've said), Sysinternals VMMap may help explain why. Turn on "Show free regions" to look at how the free virtual address space is fragmented.

bk1e
  • 23,871
  • 6
  • 54
  • 65
3

Here's an official source that states the maximum request size of the heap is defined by your linked CRT library (aside from your previous code having integer overflows going to 0 which is why you didn't get NULL back) (_HEAP_MAXREQ).

http://msdn.microsoft.com/en-us/library/6ewkz86d.aspx

Check out my answer here for large windows allocations, I include a reference to a MS paper on Vista/2008 memory model advancements.

In short, the stock CRT does not support, even for a native 64 bit process any heap size larger than 4gb. You have to use VirtualAlloc* or CreateFileMapping or some other analogues.

Oh I also noticed you are claiming that your larger allocations are actually succeeding, this is actually incorrect, you are mis-interpreting the malloc(0x200000000); (that's 8gb in hex), what is happening is you are requesting a 0 byte allocation due to a cast or some other effect of your test harness, you are most definitely not observing any thing larger than a 0xfffff000 bytes heap being committed, it is simply you are seeing integer overflows down casting.

WORD TO THE WYSE or * TIPS TO SAVE YOUR HEAP SANITY*

THE ONLY WAY TO ALLOCATE MEMORY WITH MALLOC (OR ANY OTHER DYNAMIC REQUEST)

void *foo = malloc(SIZE);

THE VALUE OF A DYNAMIC MEMORY REQUEST MUST NEVER (I CAN NOT STRESS THAT ENOUGH) BE CALCULATED WITHIN THE "()" PAREN'S OF THE REQUEST

mytype *foo = (mytype *) malloc(sizeof(mytype) * 2);

The danger is that an integer overflow will occur.

It is always a coding ERROR to perform arithmetic at the time of the call, you MUST ALWAYS calculate the TOTAL SUM of data to be requested before the statement which evaluates the request.

Why is it so bad? We know this is a mistake, because the point at which a request is made for dynamic resources, there must be a point in the future where we will use this resource.

To use what we have requested we must know how large it is ? (e.g. the array count, the type size, etc..).

This would mean, if we ever see any arithmetic at all, inside the () of a resource request, it is an error as we MUST duplicate that code again in order to use that data appropriately.

Community
  • 1
  • 1
RandomNickName42
  • 5,923
  • 1
  • 36
  • 35
  • 2
    -1. Wrong on two points. (1) You claim you cannot allocate more than 4GB of memory with the CRT on Windows. Not on Win32, but you can on Win64. A simple experiment in VS2008 confirms that. Took me 5 minutes to check that (write the program, compile, step into the CRT to check the values and watch what the internal implementation does). (2) You claim that his size is rounding down when in actual fact it is not. The input value (size) to malloc is specified as size_t not as int. Thus on 64 bit Windows you can specify any valid 64 bit size. 8GB or 20GB are well within that range. – Stephen Kellett Apr 07 '10 at 15:10
  • 1
    Stephen: Paste your broken test case. I'll show you where you went wrong. – RandomNickName42 Jun 15 '10 at 03:16
  • An integer overflow will not happen in a calculation that ought to allocate 10 GB, unless size_t is 32 bits - and in that case you can't allocate 10 GB anyway. Calculation outside the call itself isn't going to help. – gnasher729 Jul 13 '14 at 18:59
  • The point about always calculating the size of an allocation isn't quite true when allocating structs. A `mytype* foo = malloc(sizeof(*foo))` is perfectly OK and passing pointers to structs around is extremely common. However, the original point is very true for all sorts of buffers that are allocated. – markusjm Jul 03 '19 at 12:55
2

Have you tried using heap functions to allocate your memory instead?

bdonlan
  • 224,562
  • 31
  • 268
  • 324
2

Just a guess here, but malloc allocates contiguous memory and you may not have a sufficiently large contiguous section on your heap. Here's a few things I would try;

Where a 20GB malloc fails, do four 5GB mallocs succeed? If so, it is a contiguous space issue.

Have you checked your compiler switches for anything that limits total heap size, or largest heap block size?

Have you tried writing a program that declares a static variable of the required size? If this works you could implement your own heap with big mallocs in that space.

SmacL
  • 22,555
  • 12
  • 95
  • 149
  • I don't buy it - there's a 64 bit virtual address space. I can't see how the heap will have trouble finding 10GB contigious. –  May 07 '09 at 10:59
  • 1
    Possibly not, but if we believe the OPs post this is actually happening. If you can alloc the same amount of memory in smaller blocks, the likelihood is that the OS cannot provide large heap blocks that span real memory and swap space. – SmacL May 07 '09 at 13:49
  • 1
    Your virtual memory isn't limited by the 64-bit address space. It's limited by the size of your main memory and swap file, AFAIK. – Seun Osewa May 17 '09 at 06:34
1

The problem is that Visual Studio does not define WIN64 when you compile a 64 bit application, it usually still keeps WIN32, which is wrong for 64 bit apps. This then causes the run-time to use the 32-bit value when _HEAP_MAXREQ is defined, so all large malloc() will fail. If you change your project (under project properties, preprocessed definitions) to be WIN64, then the very large malloc() should have no trouble at all.

NoNaMe
  • 6,020
  • 30
  • 82
  • 110
1

I found the question interesting so I tried to research it, from a theoretical POV:

In 64-bit (actually 48-bit usable due to chip limitations, and less (44 bits?) due to OS limitations) you should certainly should not be limited by virtual memory fragmentation, i.e. lack of contiguous virtual address space. The reason is there is just so much virtual address space that it is quite impractical to exhaust it.

Also, we can expect that physical memory fragmentation should not be an issue, as virtual memory means there doesn't need to be a contiguous physical memory address range in order to satisfy an allocation request. Instead it can be satisfied with any sufficiently large set of memory pages.

So you must be running into something else: o.e. some other limitation that applies to virtual memory.

One other limit which definitely exists on Windows is the commit limit. More information on this:

https://web.archive.org/web/20150109180451/http://blogs.technet.com/b/markrussinovich/archive/2008/11/17/3155406.aspx

Other possible limits could exist, e.g. quirk of how the actual implementation has to work with the actual hardware. Imagine that when trying to create a mapping of virtual address space to physical address space you run out of entries in the page table to do the virtual address mapping... does the OS memory allocator code care to handle this unlikely scenario? Perhaps not...

You can read more information on how page tables actually work to do virtual address translation here:

http://en.wikipedia.org/wiki/Memory_management_unit

genpfault
  • 51,148
  • 11
  • 85
  • 139
Tim Lovell-Smith
  • 15,310
  • 14
  • 76
  • 93
0

But why else could a malloc fail? I can't think of other reasons

As implicitly stated previously several times, because of memory fragmentation

dmityugov
  • 4,390
  • 23
  • 18
0

It is most likely fragmentation. For simplicity, let's use an example.

The memory consists of a single 12kb module. This memory is organised into 1kb blocks in the MMU. So, you have 12 x 1kb blocks. Your OS uses 100 bytes but this is basically the code that manages the page tables. So, you cannot swap it out. Then, your apps all use 100 bytes each.

Now, with just your OS and your application running (200 bytes), you would already be using 200 bytes of memory (occupying 2kb blocks). Leaving exactly 10kb available for malloc().

Now, you started by malloc() a couple of buffers - A (900 byte), B (200 byte). Then, you free up A. Now, you have 9.8kb free (non-contiguous). So, you try to malloc() C (9kb). Suddenly, you fail.

You have 8.9k contiguous at the tail end and 0.9k at the front end. You cannot re-map the first block to the end because B stretches over the first 1k and the second 1k blocks.

You can still malloc() a single 8kb block.

Granted, this example is a little contrived, but hope it helps.

sybreon
  • 3,128
  • 18
  • 19