I am having a issue here for using pointers. Before I do that I have a performance concern. Suppose there is a 2D matrix like this:
0.0 0.0 0.0.....
0.0 0.7 0.5.....
0.0 0.5 0.8.....
0.0 0.3 0.8.....
.....
And I need to calculate the gradient of this thing. Therefore, for each number, I'll need the number as well as all its 4 nearest neighbors of this 2D matrix. Besides the first and last row and column are 0.
Now I have two method:
Make such a NxN matrix directly and calculate the gradient. Exactly follow the description. Here the memory use is NxNxreal*8, The loop start from calculating the (2,2) element then (2,3), ...
Make a (N-2)x(N-2)+1 array, and a NxN pointer matrix (use type at the moment). the (N-2)x(N-2) elements of the array will store the numbers except the 0.0s on the border. The last element of the is 0.0. For the pointer matrix, all the elements on the boarder will point to the last element of the array, 0.0. Other pointers should point to the places where they suppose to point to.
Here comes a issue of performance since the matrix that I am handling can be really huge or maybe 3D.
For method 1, there is nothing to say since it is just a straight forward method.
For method 2, I am wondering if the compiler can handle the issue properly. Since each FORTRAN pointer is like a structure according to my understanding so far. IF that is the case, FORTRAN pointer is slower than c pointer since it is not just a simple de-reference? I am also wondering if the type warp of the pointer decrease the performance (that warp is needed to make a pointer matrix). Is ther a particular reason that I should give up method 2 since it should be slower?
Let's take IVF on windows, gfortran and ifort on Linux for example. Since it can be compiler dependent.
UPDATE: Appreciate Stefan's code. I wrote on by my self as well.
program stencil
implicit none
type pp
real*8, pointer :: ptr
endtype pp
type(pp), allocatable :: parray(:,:)
real*8, allocatable, target :: array(:)
real*8, allocatable :: grad(:,:,:), direct(:,:)
integer, parameter :: n = 5000
integer :: i, j
integer :: clock_rate, clock_start, clock_stop
allocate(array(n**2+1))
allocate(parray(0:n+1, 0:n+1))
allocate(grad(2, n, n))
call random_number(array)
array(n**2+1) = 0
do i = 0, n + 1
parray(0,i)%ptr => array(n**2+1)
parray(n+1,i)%ptr => array(n**2+1)
parray(i,0)%ptr => array(n**2+1)
parray(i,n+1)%ptr => array(n**2+1)
enddo
do i = 1, n
do j = 1, n
parray(i,j)%ptr => array((i-1) * n + j)
enddo
enddo
!now stencil
call system_clock(count_rate=clock_rate)
call system_clock(count=clock_start)
do j = 1, n
do i = 1, n
grad(1, i, j) = (parray(i + 1,j)%ptr - parray(i - 1,j)%ptr)/2.D0
grad(2, i, j) = (parray(i,j + 1)%ptr - parray(i,j - 1)%ptr)/2.D0
enddo
enddo
call system_clock(count=clock_stop)
print *, "pointer, time cost= ", real(clock_stop-clock_start)/real(clock_rate)
deallocate(array)
deallocate(parray)
allocate(direct(0:n+1, 0:n+1))
call random_number(direct)
do i = 0, n + 1
direct(0,i) = 0
direct(n+1,i) = 0
direct(i,0) = 0
direct(i,n+1) = 0
enddo
!now stencil directly
call system_clock(count_rate=clock_rate)
call system_clock(count=clock_start)
do j = 1, n
do i = 1, n
grad(1, i, j) = (direct(i + 1,j) - direct(i - 1,j))/2.D0
grad(2, i, j) = (direct(i,j + 1) - direct(i,j - 1))/2.D0
enddo
enddo
call system_clock(count=clock_stop)
print *, "direct, time cost= ", real(clock_stop-clock_start)/real(clock_rate)
endprogram stencil
result (o0):
pointer, time cost= 2.170000
direct, time cost= 1.127000
result (o2):
pointer, time cost= 0.5110000
direct, time cost= 9.4999999E-02
So FORTRAN pointer is much slower. Stefan has pointed that out earlier. Now I am wondering if there is room for improvement. As I know so far, if I did it with c, the difference should not be this much.