How to handle an array with size 1,000,000,000 in C++?

Question

I need to handle 3D cube data. Its number of elements can be several billions. I understand I can't allocate that much memory on Windows. So I am thinking disk-based operations with in-process database. Is there any better way to do this? Maybe something in boost?

Update: I will eventually have to provide browsing functionality with plots.

Update2: The following article seemed to be a good solution using memory mapped file. I will try it and update again. http://www.codeproject.com/Articles/26275/Using-memory-mapped-files-to-conserve-physical-mem

Well, Win7 Pro supports up to 192GB, which is more than several, I think :) — Igor Korkhov, Mar 06 '12 at 17:11
Please explain what you are trying to do with your data. With that many elements, you REALLY want to think carefully what you are trying to accomplish. The algorithms you will be running will determine the appropriate data structure. For example, I'm thinking you might actually be talking about a 1000x1000x1000 cube, and perhaps most of the elements will be empty. Maybe you want an oct-tree. But maybe something completely different is required. We need more information. — Alan Baljeu, Mar 06 '12 at 17:12
@AlanBaljeu I hope your guess would be right. But the cube is images with another dense dimension. So I need that much of size and all elements are equally important. — Tae-Sung Shin, Mar 06 '12 at 17:16
@IgorKorkhov I have 6GB memory in my machine but it can't allocate 1G memory. — Tae-Sung Shin, Mar 06 '12 at 17:17
@david: are we talking user mode here? If it's a 32bit process the absolute limit should be (normally) 2 GiB of RAM, whereas in 64bit the limit named by Igor applies. 32bit with PAE will still have the same per-process limitations as without PAE, but supports up to 64 GiB overall, IIRC. — 0xC0000022L, Mar 06 '12 at 17:30
@STATUS_ACCESS_DENIED forgot about that. Thanks for pointing it out. — Tae-Sung Shin, Mar 06 '12 at 17:35
Update your question to describe the data and the kind of manipulation you want. — Alan Baljeu, Mar 06 '12 at 17:58
Check out my answer to this question: http://stackoverflow.com/questions/9227653/best-way-to-save-data-for-re-use-off-an-voxel-editor/9228249#9228249 - the OP faced a similar issue. — cmannett85, Mar 06 '12 at 18:23

score 5 · Answer 1 · answered Mar 06 '12 at 17:13

5

The first and most basic step is to break the data down into chunks. The size of the chunk depends on your needs: it could be the smallest or largest chunk that can be drawn at once, or for which geometry can be built, or an optimal size for compression.

Once you're working with manageable chunks, the immediate memory problem is averted. Stream the chunks (load and unload/save) as needed.

During the load/save process, you may want to involve compression and/or a database of sorts. Even something simple like RLE and SQLite (single table with coordinates and data blob) can save a good bit of space. Better compression will allow you to work with larger chunk sizes.

Depending on usage, it may be possible to keep chunks compressed in-memory and only uncompress briefly on modification (or when they could be modified). If your data is read-only, loading them and uncompressing only when needed will be very helpful.

Splitting the data into chunks also has side-benefits, such as being an extremely simple form for octrees, allowing geometry generation (marching cubes and such) to run on isolated chunks of data (simplifies threading), and making the save/load process significantly simpler.

answered Mar 06 '12 at 17:13

ssube

47,010
7
103
140

Again, my data is images with another dense dimension. I am not sure how I can break the data into chunks. Yeah I was thinking SQLite too. I really hoped there is better API though. Thanks for your answer. – Tae-Sung Shin Mar 06 '12 at 17:23
If your data is images, the natural assumption would be to split it by image. That would work in a slice context. If the data is dependent on its 3D neighbors, it's slightly more complex. If you could describe the data in more detail in the question, or better yet give a small sample, that would be helpful. – ssube Mar 06 '12 at 17:27
Yeah data is dependent on its 3D neighbors. It's hyperspectral images. So I can't really split by image – Tae-Sung Shin Mar 06 '12 at 17:34
1

Images aren't dependent on all their neighbors, with a few exceptions. In any case, you may be able to take a region of the stack; the shape formed by extruding a square through the stack of images. Google Maps uses a similar feature to handle streaming tiles, effectively working on a single-layer stack. Depending on the depth-to-width of your images, that method may work (you may need a small margin for mipmapping and other blurring effects). – ssube Mar 06 '12 at 17:41
they dependent on all their neighbors in this case. – Tae-Sung Shin Mar 06 '12 at 19:09

0xC0000022L · Accepted Answer · 2012-03-06T17:18:18.073

4

Can you perhaps store the data more efficiently (read "Programming Pearls" by Bentley), is it sparse data?!

If not, memory mapped files (MMF) are your friend and allow you to map chunks of MMF into memory that you can access like any other memory.

Use CreateFileMapping and MapViewOfFile to map a chunk into your process.

edited Mar 06 '12 at 17:18

answered Mar 06 '12 at 17:08

0xC0000022L

20,597
9
86
152

Thanks for your answer. No it's not sparse at all. It's images with another dense dimension. Can you give me some reference on how to deal with MMF? I really hope there is an easy library so I can use it like array. – Tae-Sung Shin Mar 06 '12 at 17:12
I haven't used your answer directly but I found a solution using memory mapped file. Thanks for giving me the direction. – Tae-Sung Shin Mar 06 '12 at 19:30

score 0 · Answer 3 · answered Apr 20 '15 at 03:05

try VirtualAlloc from <windows.h>.

https://msdn.microsoft.com/en-us/library/windows/desktop/aa366887%28v=vs.85%29.aspx

its quite useful for large arrays as long as they fit in your RAM, after that 0xC0000022L's answer is probably the better solution

How to handle an array with size 1,000,000,000 in C++?

3 Answers3

Linked

Related