2

So I am currently trying to allocate dynamically a large array of elements in C++ (using "new"). Obviously, when "large" becomes too large (>4GB), my program crashes with a "bad_alloc" exception because it can't find such a large chunk of memory available.

I could allocate each element of my array separately and then store the pointers to these elements in a separate array. However, time is critical in my application so I would like to avoid as much cache misses as I can. I could also group some of these elements into blocks but what would be the best size for such a block?

My question is then: what is the best way (timewise) to allocate dynamically a large array of elements such that elements do not have to be stored contiguously but they must be accessible by index (using [])? This array is never going to be resized, no elements is going to be inserted or deleted of it.

I thought I could use std::deque for this purpose, knowing that the elements of an std::deque might or might not be stored contiguously in memory but I read there are concerns about the extra memory this container takes?

Thank you for your help on this!

  • 1
    Ask for the biggest chunk you think is "reasonable", catch the `bad_alloc` if it happens, halve the size of the request. Put this code into a loop and repeat until you have allocated enough memory. – Richard Critten Jan 13 '18 at 19:29
  • 3
    I can't imagine the memory overhead of a `std::deque` could be significant compared to a 4GB allocation. – François Andrieux Jan 13 '18 at 19:30
  • 1
    Chances are you will not be able to allocate more than 2GB for a 32-bit process or 4GB for a 64-bit process if using Visual C++. Chances are you don't need that amount of memory and need to rethink the design. – Ron Jan 13 '18 at 19:32
  • It is operating system and compiler specific. And probably processor and [ISA](https://en.wikipedia.org/wiki/Instruction_set_architecture) specific too. How much memory do you have? What OS, what compiler do you use? – Basile Starynkevitch Jan 13 '18 at 19:44
  • @Ron Visual C++ has no issues allocating >4GB memory chunks, unless you go out of your way to limit the 64 bit process to a 4GB memory limit. – 1201ProgramAlarm Jan 13 '18 at 20:18

4 Answers4

2

If your problem is such that you actually run out of memory allocating fairly small blocks (as is done by deque) is not going to help, the overhead of tracking the allocations will only make the situation worse. You need to re-think your implementation such that you can deal with it in blocks that will still fit in memory. For such problems, if using x86 or x64 based hardware I would suggest blocks of at least 2 megabytes (the large page size).

SoronelHaetir
  • 14,104
  • 1
  • 12
  • 23
1

Obviously, when "large" becomes too large (>4GB), my program crashes with a "bad_alloc" exception because it can't find such a large chunk of memory available.

You should be using 64-bit CPU and OS at this point, allocating huge contiguous chunk of memory should not be a problem, unless you are actually running out of memory. It is possible that you are building 32-bit program. In this case you won't be able to allocate more than 4 GB. You should build 64-bit application.

If you want something better than plain operator new, then your question is OS-specific. Look at API provided by your OS: on POSIX system you should look for mmap and for VirtualAlloc on Windows.

There are multiple problems with large allocations:

  • For security reasons OS kernel never gives you pages filled with garbage values, instead all new memory will be zero initialized. This means you don't have to initialize that memory as long as zeroes are exactly what you want.
  • OS gives you real memory lazily on first access. If you are processing large array, you might waste a lot of time taking page faults. To avoid this you can use MAP_POPULATE on Linux. On Windows you can try PrefetchVirtualMemory (but I am not sure if it can do the job). This should make init allocation slower, but should decrease total time spent in kernel.
  • Working with large chunks of memory wastes slots in Translation Lookaside Buffer (TLB). Depending on you memory access pattern, this can cause noticeable slowdown. To avoid this you can try using large pages (mmap with MAP_HUGETLB, MAP_HUGE_2MB, MAP_HUGE_1GB on Linux, VirtualAlloc and MEM_LARGE_PAGES). Using large pages is not easy, as they are usually not available by default. They also cannot be swapped out (always "locked in memory"), so using them requires privileges.

If you don't want to use OS-specific functions, the best you can find in C++ is std::calloc. Unlike std::malloc or operator new it returns zero initialized memory so you can probably avoid wasting time initializing that memory. Other than that, there is nothing special about that function. But this is the closest you can get while staying withing standard C++.

There are no standard containers designed to handle large allocations, moreover, all standard container are really really bad at handling those situations.

Some OSes (like Linux) overcommit memory, others (like Windows) do not. Windows might refuse to give you memory if it knows it won't be able to satisfy your request later. To avoid this you might want to increase your page file. Windows needs to reserve that space on disk beforehand, but it does not mean it will use it (start swapping). As actual memory is given to programs lazily, there are might be a lot of memory reserved for applications that will never be actually given to them.

If increasing page file is too inconvenient, you can try creating large file and map it into memory. That file will serve as a "page file" for your memory. See CreateFileMapping and MapViewOfFile.

0

The answer to this question is extremely application, and platform, dependent. These days if you just need a small integer factor greater than 4GB, you use a 64-bit machine, if possible. Sometimes reducing the size of the element in the array is possible as well. (E.g. using 16-bit fixed-point of half-float instead of 32-bit float.)

Beyond this, you are either looking at sparse arrays or out-of-core techniques. Sparse arrays are used when you are not actually storing elements at all locations in the array. There are many possible implementations and which is best depends on both the distribution of the data and the access pattern of the algorithm. See Eigen for example.

Out-of-core involves explicitly reading and writing parts of the array to/from disk. This used to be fairly common, but people work pretty hard to avoid doing this now. Applications that really require such are often built on top of a database or similar to handle the data management. In scientific computing, one ends up needing to distribute the compute as well as the data storage so there's a lot of complexity around that as well. For important problems the entire design may be driven by having good locality of reference.

Any sparse data structure will have overhead in how much space it takes. This can be fairly low, but it means you have to be careful if you actually have a dense array and are simply looking to avoid memory fragmentation.

If your problem can be broken into smaller pieces that only access part of the array at a time and the main issue is memory fragmentation making it hard to allocate one large block, then breaking the array in to pieces, effectively adding an outer vector of pointers, is a good bet. If you have random access to an array larger than 4 gigabytes and no way to localize the accesses, 64-bit is the way to go.

Zalman Stern
  • 3,161
  • 12
  • 18
0

Depending on what you need the memory for and your speed concerns, and if you're using Linux, you can always try using mmap and simulate a sort of swap. It might be slower, but you can map very large sizes. See Mmap() an entire large file

rhobincu
  • 906
  • 1
  • 7
  • 22