2

I am trying to understand 9 point stencil's algorithm from this book , the logic is clear to me , but the calculation of WIDTHP macro is what i am unable to understand, here is the breif code (original code is more than 300 lines length!!):

#define PAD64 0
#define WIDTH 5900
#if PAD64
#define WIDTHP ((((WIDTH*sizeof(REAL))+63)/64)*(64/sizeof(REAL)))
#else
#define WIDTHP WIDTH
#endif
#define HEIGHT 10000

REAL *fa = (REAL *)malloc(sizeof(REAL)*WIDTHP*HEIGHT);
REAL *fb = (REAL *)malloc(sizeof(REAL)*WIDTHP*HEIGHT);

original array is 5900 X 10000, but if i define PAD64 , the array becomes 5915.75 X 10000

Though so far i can guess that the author is trying to align & pad array to 64 byte boundary. But array returned by malloc is usually aligned(& padded) , also, the posix_memalign gives you a chunk of memory that is guaranteed to have the requested alignment , we can also use

__attribute__((align(64)))

what impact does this WIDTHP can make on my code's performance?

Community
  • 1
  • 1
puneet336
  • 433
  • 5
  • 20
  • `malloc()` returns memory aligned to a 16-byte boundary, not a 64-byte boundary. The memory `malloc()` returns is **not** "padded". – EOF Apr 30 '15 at 08:38

3 Answers3

3

The idea is that each row of the matrix (or column, if it's treated as a column-major matrix) can be aligned to the start of a new cache line, by adding padding to the end of the line. Exactly what impact this has depends of course a lot on the access pattern, but in general cache-friendliness can be quite important for intensely number-crunching code.

Also, the computation is integer, so the result is certainly not 5915.75, that doesn't make sense.

unwind
  • 391,730
  • 64
  • 469
  • 606
  • 1
    I understand the concept of padding , but your explanation is still not clear to me.This calculation has added additional 15 (5900 + 15 ) & the rows(5915) is neither a multiple of 2 nor is this divisible by 64. modern Intel architecture retrieve 64 bytes cache line. the nearest multiple **of 64 around 5900 is either 5952 or 5888**, If you have some other way of calculating & aligning the WIDTHP , then it would be great if you can explain here!! – puneet336 Apr 30 '15 at 09:10
2

I was going to put this in as a comment to unwind's answer because he's right. But perhaps I can explain more clearly, albeit in more characters than will fit in a comment.

When I do the math, I get 5904 reals, which is 23616 bytes, which is 396 cache lines for 64 byte cache lines. It is the bytes, rather than the number of elements which must be a multiple of 64.

As to why you want to pad the value of width, lets look at a smaller example. Let's pretend we had a "cache line" that holds 10 letter and that we have an "array" with a width of 8 letters and height of 4. Now since our hypothetical array is in C and C is row major, the array will look something like this: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD

but what does it look like when it is arranged in cache lines, since those are 10 letters long: AAAAAAAABB BBBBBBCCCC CCCCDDDDDD DD

Not good. Only the first row of the array is aligned. But if we pad width by two spaces, we get this in cache: AAAAAAAA__ BBBBBBBB__ CCCCCCCC__ DDDDDDDD__

which is what we want. Now we can have a nested loop like

for i = 1 to height
   for j = 1 to width

and know that every time we start to work on the j loop, the data we need will be aligned.

Oh, and yes, they really should do something to make sure that the first element of the array is aligned. 'attribute((align(64)))' won't work because the arrays are being allocated dynamically but they could have used posix_memalign instead of malloc.

froth
  • 319
  • 1
  • 6
-1

The width p calculation is say

( Width/64) +1

Well rounded for int precision math. I'd give you a better answer except in the SE mobile app it ain't viable to flick between this and the listing

phil
  • 561
  • 3
  • 10
  • why +1 ?? if you can then please elaborate, also check out my comment below unwind's answer! – puneet336 Apr 30 '15 at 09:39
  • It's not quite, it's actually width+(63/sizeof REAL). Which is width + nearly 1 or 7.7. What number is sizeof real. Cos all math taka place in int precision there's a rounding down, this 'crazy' formula just controls it better – phil Apr 30 '15 at 09:46
  • @rhubarbdog: Nope. The idiom `(x+(y-1))/y` is used to calculate `ceil(x/y)` when normal division calculates `floor(x/y)`. – EOF Apr 30 '15 at 10:41