-1

We have a problem with using of HeapCreate()/HeapAlloc() for big allocations (> 512K)

We are developing a C++ server application performing some 'image processing' operations concurrently on a few images. It should work for a long time without restarting.

Our processing model is quite specific. Server starts, performs some required analysis in order to detect max. number of concurrent images for the given hardware configuration, meaning stable working with best performance, quickly reaches the max loading and then works more or less with the same high loading most of the time, depending on input queue.

That means we utilize all required memory at the beginning and total amount of memory should not grow (if everything is fine). Our pain is fragmentation. Size of incoming images may vary from 400K to possibly 50M and processing of each one leads to corresponding (proportional to image size) relatively big OPENCV allocations. Processing scenarios (and related allocations) vary, depends on image specifics, alloc/free actions are very intensive, then after some time we get fragmentation. Some local optimizations were developed given negligible improvements. Actually we have out-of-memory/fragmentation related effects after approx. 50000-70000 images which is not so acceptable. Current solution is restarting of the server, which is far from be ideal.

Initial naive proposal to solve the problem was:

  • We have own custom heap committing initially the whole required memory.
  • All required 'big' OPENCV allocations (and ONLY those) redirected to this heap
  • At the moment, fragmentation arrives, we stop new input and finish all running jobs.
  • That means all image related allocations are released.
  • Check the heap and clean it if required (due to memory leaks, for example)
  • Now, we have absolutely empty heap and can start from scratch. Open input again.

Simple proof-of-concept project quickly figured out the following:

  • HeapCreate(), committing initially 250M, grows by 10M each time I call HeapAlloc() from it! Strange, isn't it?
  • As was recognized using HeapWalk(), the committed memory was reserved not in one continuous block, but as a list of more than 500 chunks of 512K each. So none of them was suitable for my 10M request and heap called to process uncommitted memory

It seems Win32 Custom Heap is optimized for small allocations only and I was unable to find a way to use it for my needs :( VirtualAlloc() seems to be a solution, but it's very low-level API and using it means developing of my own memory-management system, seems some kind of wheel reinvention.

I want to believe some standard way exists and I just cannot find it. Any help or relevant resources to read will be much appreciated

vadimus
  • 21
  • 2
  • Sorry for the initial formatting. It was a technical problem. – vadimus Mar 27 '14 at 16:20
  • This problem is entirely too trivial to solve today. Rebuild the program to target x64. – Hans Passant Mar 27 '14 at 17:04
  • @HansPassant: Re-targeting to a 64-bit model may simply postpone the problem rather than solve it. If a program has a pernicious fragmentation problem _and_ it must run indefinitely, it can eventually exhaust the virtual address space and it might exhaust available storage (if the fragmentation is so bad that pages cannot be freed). Even if the postponement is for a very long time, the performance may suffer from churn. – Adrian McCarthy Mar 27 '14 at 17:24
  • No, fragmenting a 256 terabyte address space and not finding a 50 MB hole with code that worked in a 2 gigabyte space is flat-out impossible. – Hans Passant Mar 27 '14 at 17:50
  • @Hans Passant: Windows x64 has a user-mode virtual address space of 8 TB, not 256. The app does not work indefinitely in a 2 GB space because of fragmentation problems; thus the question. If the fragmentation is bad enough to prevent uncommitting of memory, you will eventually hit the commit limit (typically around 1 TB). – Adrian McCarthy Mar 27 '14 at 21:26
  • Thanks for your comments. Yes, we have 64 bit version of the system, but for some reasons, part of our clients have to use 32 bit (at least for the near future). On other side, the engine tends to consume more and more memory (complexity and concurrency), so we would prefer to solve the problem and not just postpone it (absolutely agree with Adrian) – vadimus Mar 28 '14 at 21:44
  • I never understood the downvotes on this one. – Adrian McCarthy May 10 '15 at 15:07

1 Answers1

0

A few ideas:

  1. Heaps generally manage small suballocations from a larger block of memory. If you need large allocations, a heap may not be the solution. You may have to roll your own and deal directly with virtual memory.

  2. It's not clear if HeapAlloc for large allocations are actually allocated from the heap's reserved memory. MSDN is a somewhat vague and occasionally self-contradictory, but the page on the low-fragmentation heap (LFH) says that allocations larger than 16 KB don't use the LFH. That might mean that the heap tracks it for you but really satisfies the large allocations from VirtualAlloc calls rather than from the reserved memory. If that's the case, using a heap might just be making things worse. (In any event, it might be worth trying with and without the LFH enabled.)

  3. If your problem tends to be fragmentation rather than actual memory exhaustion, then you may be better off wasting some memory in order to eliminate fragmentation. If your largest allocations require 50 MB, then you might consider making all your allocations 50 MB large, even if the image is dramatically smaller. On average, you'll have fewer allocated blocks (so you can't handle as many images at once), but you'll never get fragmentation if the allocations are always the same size. Whether that's an acceptable trade-off depends on the detail of your situation. You could compromise and have a bunch of blocks of size X to handle the smaller ones if they're more common, and just a few of size Y to handle the largest possible.

  4. Another approach is tiling, though this can dramatically affect how your application is architected. The idea is to work with tiles of a fixed size rather than images of variable sizes. Images are cut into as many tiles as necessary based on the tile size. Tiles are processed independently, and the output image is reassembled from the tiles. Since all the tiles are the same size, you avoid fragmentation. Some image processing is very amenable to this, but other types are not.

Adrian McCarthy
  • 45,555
  • 16
  • 123
  • 175
  • Thanks, tiles is a really good idea. Unfortunately we have to analyse an "entire" document area, at least on the first stage.Generally say, 50M is a small suballocation of larger 800M block. It will be ideal solution to configure heap this way. Yes, it seems we have to deal with VirtualAlloc(). I just hope some standard or known implementation exist for such a heap oriented on big allocations (based on VirtualAlloc() definitely) – vadimus Mar 28 '14 at 22:31
  • You could build an abstraction on top of the tiling such that, from the point of view of the code, it is just one big image, but the representation is divided into tiles of equal size. – Adrian McCarthy May 10 '15 at 15:07