3

So I just got my grade back from a school project that I did well on, but the grader took five points off because I didn't make a call to ceil(...). Its a parallel computing course using CUDA, but the question isn't directly related to any CUDA feature.

Here is the "offending" line:

dim3 dimGrid(n / dimBlock.x, n / dimBlock.y);

His claim is that I should have done:

dim3 dimGrid(ceil(n / dimBlock.x), ceil(n / dimBlock.y));

So my question is, why would I be marked off for this if n and dimBlock.* are integers? Their result will be calculated before ceil is even called and truncated. Thus it seems silly to mark off for that.

The following examples below seem to show that GCC optimizes the call out anyway when using -O2.

With ceil:

#include <stdio.h>
#include <math.h>

int main()
{
        int m = 3, n = 5, o;

        o = ceil(n / m);
        printf("%d\n", o);
        return 0;
}

Without:

#include <stdio.h>
#include <math.h>

int main()
{
        int m = 3, n = 5, o;

        o = n / m;
        printf("%d\n", o);
        return 0;
}

While I understand its only five points, I still want to understand why if I am completely wrong.

csnate
  • 1,601
  • 4
  • 19
  • 31
  • Maybe they are not supposed to be integers? – Boann Sep 29 '14 at 17:57
  • 4
    If you intended to round up, then you'd need to cast to a float to avoid integer division or do something like `(n + dimBlock.x - 1) / dimBlock.x` – Mysticial Sep 29 '14 at 17:58
  • That's the thing, you wouldn't intend to round up and nor would you expect either to be floats. dimBlock.* by definition is always an integer value in CUDA and 'n' is the size of each dimension of a matrix. The particular code in question deals with indexing, so rounding up would actually could cause a segmentation fault – csnate Sep 29 '14 at 17:59
  • 3
    Using of ceil here doesn't make any sense (as all operations are on integers), I suppose reviewer just missed that there are integers and you should consult with him – Iłya Bursov Sep 29 '14 at 18:01
  • Okay, so I'm not crazy... :) – csnate Sep 29 '14 at 18:06
  • By _somehow_ using `ceil()` or the better non-FP solution `(n + dimBlock.x - 1) / dimBlock.x` (@ Mysticial), it appears the problem is simple that the result was not rounded up. Agree using `ceil(n / dimBlock.x)` where `n` and `dimBlock.x` are integers does not work. – chux - Reinstate Monica Sep 29 '14 at 20:15

2 Answers2

5

The grader probably meant that you needed to use the ceiling of the fraction n/d, and this is perfectly right: this way there will be enough blocks to cover n, the last block possibly being incomplete.

That does not mean that the appropriate implementation is with the C expression ceil(n/d). Indeed, the C / is an integer division and will discard the decimal part, actually taking the floor of the fraction.

You can use ceil((double)n/(double)d) instead.

But my favorite way would be without converting to doubles: (n+d-1)/d.

  • So it turned out this is the case. However my code was correct, all tests passed fine when running, hence why I still ended up with an A. Basically, in the event that n = 16 (which never happens in my code) and dimBlock.x and dimBlock.y = 1000. 1000 / 16 = 62. Since I use dimBlock and dimGrid to launch a CUDA kernel to perform matrix multiplication, I would be missing elements! Still think he could have given me the points since I passed all tests :p, but it makes sense now. – csnate Sep 30 '14 at 14:25
  • If 8 elements were indeed left out of the computation, then the test program shouldn't get an A! ;-) –  Sep 30 '14 at 14:53
  • They were not though. Because n is always 8 in each of the test cases, so i never saw the actual problem. I dont know why the grader changed the code a anyway. – csnate Sep 30 '14 at 15:04
  • But then the grid has no blocks at all ?! (or you actually consider n/d+1 blocks, which is off by one when d divides n exactly.) –  Sep 30 '14 at 15:07
-1

here, m = 3, n = 5 so, n / m= 1.67(approx); since you are assigning it o which is of int type, it will truncate it. i.e, only stores the integer part not decimal part, so we have o=1. While if you will use ceil(n/m), output would be 2, which is then assigned to o. i.e, o=2.

abhinash
  • 188
  • 10
  • ceil will be applied to n/m, a division of two integers, that yields an integer (actually floor(n/m)). –  Sep 30 '14 at 14:00