1

I have allocated memory using valloc, let's say array A of [15*sizeof(double)]. Now I divided it into three pieces and I want to bind each piece (of length 5) into three NUMA nodes (let's say 0,1, and 2). Currently, I am doing the following:

double* A=(double*)valloc(15*sizeof(double));

piece=5; 
nodemask=1;
mbind(&A[0],piece*sizeof(double),MPOL_BIND,&nodemask,64,MPOL_MF_MOVE);

nodemask=2;
mbind(&A[5],piece*sizeof(double),MPOL_BIND,&nodemask,64,MPOL_MF_MOVE);

nodemask=4;
mbind(&A[10],piece*sizeof(double),MPOL_BIND,&nodemask,64,MPOL_MF_MOVE);

First question is am I doing it right? I.e. is there any problems with being properly aligned to page size for example? Currently with size of 15 for array A it runs fine, but if I reset the array size to something like 6156000 and piece=2052000, and subsequently three calls to mbind start with &A[0], &A[2052000], and &A[4104000] then I am getting a segmentation fault (and sometimes it just hangs there). Why it runs for small size fine but for larger gives me segfault? Thanks.

tiki
  • 419
  • 1
  • 6
  • 16

1 Answers1

1

For this to work, you need to deal with chunks of memory that are at least page-size and page-aligned - that means 4KB in most systems. In your case, I suspect the page gets moved twice (possibly three times), due to you calling mbind() three times over.

The way numa memory is located is that CPU socket 0 has a range of 0..X-1 MB, socket 1 has X..2X-1, socket three has 2X-3X-1, etc. Of course, if you stick a 4GB stick of ram next to socket 0 and a 16GB in the socket 1, then the distribution isn't even. But the principle still stands that a large chunk of memory is allocated for each socket, in accordance to where the memory is actually located.

As a consequence of how the memory is located, the physical location of the memory you are using will have to be placed in the linear (virtual) address space by page-mapping.

So, for large "chunks" of memory, it is fine to move it around, but for small chunks, it won't work quite right - you certainly can't "split" a page into something that is affine to two different CPU sockets.

Edit:

To split an array, you first need to find the page-aligned size.

page_size = sysconf(_SC_PAGESIZE);

objs_per_page = page_size / sizeof(A[0]); 
// We should be an even number of "objects" per page. This checks that that 
// no object straddles a page-boundary
ASSERT(page_size % sizeof(A[0]));   

split_three = SIZE / 3; 

aligned_size = (split_three / objs_per_page) * objs_per_page;

remnant = SIZE - (aligned_size * 3);

piece = aligned_size;

mbind(&A[0],piece*sizeof(double),MPOL_BIND,&nodemask,64,MPOL_MF_MOVE);

mbind(&A[aligned_size],piece*sizeof(double),MPOL_BIND,&nodemask,64,MPOL_MF_MOVE);

mbind(&A[aligned_size*2 + remnant],piece*sizeof(double),MPOL_BIND,&nodemask,64,MPOL_MF_MOVE);

Obviously, you will now need to split the three threads similarly using the aligned size and remnant as needed.

Mark Lakata
  • 19,989
  • 5
  • 106
  • 123
Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • Thanks. I would really appreciate if you could show how it would be correctly aligned for larger array size (6156000) example above. Thanks in advance. – tiki Jan 25 '13 at 19:19
  • It would really help if you posted the code that allocates the array. – Mats Petersson Jan 25 '13 at 19:22
  • I posted it. Thanks a lot in advance. – tiki Jan 25 '13 at 20:32
  • I changed ASSERT to assert and included assert.h, then I did run it. It says 'page_size % sizeof(A[0])' failed. – tiki Jan 25 '13 at 22:00
  • Ah, yes it should be `assert(page_size % sizeof(A[0]) == 0)` (note the `== 0` part! – Mats Petersson Jan 25 '13 at 22:02
  • Now it says segmentation fault :( – tiki Jan 25 '13 at 22:03
  • Then, I suspect, the problem is in some other part of the code. – Mats Petersson Jan 25 '13 at 22:04
  • I ran it several times (it outputs segmentation fault), then after a while it runs fine. Also in above code segment you provided do I have to change the nodemask everytime I call the mbind (i.e. nodemask=1 for &A[0], and nodemask=2 for &A[aligned_size], and for last one nodemask=4)? Thanks. – tiki Jan 25 '13 at 23:59
  • Obviously, nodemask needs to be set correctly to make sure the memory is on the right node - however, that shouldn't cause a seg-fault. Segfault happens because your code accesses memory that isn't available to your process. – Mats Petersson Jan 26 '13 at 00:37
  • Now I am getting EINVAL for every mbind call. Do you know why I might be getting this? Thanks. – tiki Jan 27 '13 at 06:16
  • I asked this (i.e. why it is giving EINVAL) as a separate question. Thanks. – tiki Jan 27 '13 at 06:27