I'm trying to parallelize my sequencial C code and offload to NVIDIA GPU with OpenACC(PGI compiler)
My code is written as a sequencial code. And calling very long functions frequently, like below.
int main()
{
// blah blah...
for(i=0; i<10; i++)
{
for(j=0; j<20; j++)
{
big_function(a,b,c);
}
}
// blah blah...
}
int big_function(a,b,c)
{
small_function_1(a);
small_function_2_with_data_dependencies(b);
}
That kind of case case, big_function() can parallelize and run on GPU?
I declared whole of for loop to parallized region using #pragma acc kernels . like below.
#pragma acc routine
int big_function(int a, int b, int c);
#pragma acc routine
int small_function_1(int a);
#pragma acc routine
int small_function_2_with_data_dependencies(int b);
int main()
{
// blah blah...
#pragma acc data ~~~~
#pragma acc kernels
for(i=0; i<10; i++)
{
for(j=0; j<20; j++)
{
big_function(a,b,c);
}
}
// blah blah...
}
int big_function(a,b,c)
{
small_function_1(a);
small_function_2_with_data_dependencies(b);
}
But the compiled file takes very long time to finish. And the result was not correct.
Can I use OpenACC to parallelize sequecial code which using many function calls?
Or Do I have to break and divide big_function() to small parts?