2

I have a problem. I need to allocate few very large fields with billions of float elements.

At the moment I'm using:

float ****spaceE;
int x,y,z;
x = y = z = 100;

spaceE = (float****)malloc(x*sizeof(int));
for (int i=0; i<x; i++)
{
    spaceE[i] = (float***)malloc(y*sizeof(int));
    for(int j=0; j<y; j++)
    {
        spaceE[i][j] = (float**)malloc(z*sizeof(int));
        for(int k=0; k<z; k++)
        {
            spaceE[i][j][k] = (float*)malloc(size[3]*sizeof(float));
        }
    }
}

But it eats over 2GB of memory and Windows terminates it. I need to have few arrays like this and much bigger, is there any better way of doing this?

Mysticial
  • 464,885
  • 45
  • 335
  • 332
adrianko69
  • 21
  • 1
  • Is their any compressible structure (e.g. lots of zeroes, which could be better represented in a sparse array)? – ephemient Nov 18 '11 at 20:35
  • 2
    That's a monstrous size for finite difference. Even if you can allocate it, the runtime will be immense. Work on the algo. – David Heffernan Nov 18 '11 at 23:58

3 Answers3

3

You should use Memory Mapped Files, i think this would be a good solution. http://msdn.microsoft.com/en-us/library/dd997372.aspx

NickD
  • 2,672
  • 7
  • 37
  • 58
  • 1
    I'm not entirely sure this will get around the 2GB limit. Since it's still being mapped to a 32-bit address space. – Mysticial Nov 18 '11 at 20:34
  • There's no way it'll all fit into address space at once, so OP will have to implement a sliding view if they go down this path. – ephemient Nov 18 '11 at 20:34
  • but sometimes a database is the better solution :-) – NickD Nov 18 '11 at 20:35
  • well 2gb isnt my limit, i need much much more since there will be few arrays like that, how you mean that database? – adrianko69 Nov 18 '11 at 20:48
  • i dont think you need to process everything at the same time and keep everything in memory, you should think about changing your algorithm which is probably inefficent. – NickD Nov 18 '11 at 20:51
  • im doing finite difference time domain simulation, i really need all those values like non-stop – adrianko69 Nov 18 '11 at 20:55
  • yes using CUDA is my next step, but i need to have it done for pc first – adrianko69 Nov 18 '11 at 21:05
2

Think about it. You mention "billions of float elements". Each float is going to be 4 bytes. "Billions" already implies that's gonna need more than 4GB of ram...

What you're trying to do, is not possible because billions of floats is going to take more than 2GB of memory.

If you're just trying to get around the 2GB limit, you'll need to compile for 64-bit.

Mysticial
  • 464,885
  • 45
  • 335
  • 332
  • And `sizeof(int)` will not be correct for allocating pointers once @adrianko69 switches to 64-bit. – ephemient Nov 18 '11 at 20:33
  • moving to 64 bit if possible makes sense too. – NickD Nov 18 '11 at 20:33
  • i know, i counted it myself, but i thought there might have been some other way to allocate the field that wont consume so much memory coz a field 1000x1000x1000x3 was the smallest i wanted to try :D – adrianko69 Nov 18 '11 at 20:33
  • If you're going to use all those elements, then no, it's not possible - because the math doesn't allow it. However, if most of those elements are zero, you can try a sparse matrix representation. – Mysticial Nov 18 '11 at 20:37
  • i need all elements and i doubt there will be any zeros, i guess i will have to switch to smaller sizes for now – adrianko69 Nov 18 '11 at 20:44
  • Compiling for x64 is an obvious option if your machine is 64-bit and has more than 2GB of ram. Otherwise there isn't much other choice other than manually using disk. Based on your other comments, this is not a good idea since you'll be touching all the data-points non-stop. In any case, this kind of hard-core finite-element stuff is one of the reasons people will cough up a LOT of money for those high-end workstations with 64+ GB of memory and tons of cores. – Mysticial Nov 18 '11 at 21:08
1

Depending on what you are trying to do, and the platform architecture (cluster?) you may need to work on files and only vivify active data chunks or distribute your load across machines.

perreal
  • 94,503
  • 21
  • 155
  • 181
  • for now im just developing the code that will later run on CUDA and some bigger machines, thats why im trying to find out the best way how to allocate the field – adrianko69 Nov 18 '11 at 20:47
  • For CUDA writing the values in parallel with threads should be the best you can do. – perreal Nov 18 '11 at 20:52
  • so for now best thing to do is switch to smaller fields i guess, coz im still far away from the CUDA stuff :D – adrianko69 Nov 18 '11 at 20:56