Determine assumed-shape array strides at runtime

Question

Is it possible in a modern Fortran compiler such as Intel Fortran to determine array strides at runtime? For example, I may want to perform a Fast Fourier Transform (FFT) on an array section:

program main

    complex(8),allocatable::array(:,:)

    allocate(array(17, 17))
    array = 1.0d0

    call fft(array(1:16,1:16))

contains

    subroutine fft(a)  
        use mkl_dfti

        implicit none

        complex(8),intent(inout)::a(:,:)

        type(dfti_descriptor),pointer::desc
        integer::stat

        stat = DftiCreateDescriptor(desc, DFTI_DOUBLE, DFTI_COMPLEX, 2, shape(a) )
        stat = DftiCommitDescriptor(desc)
        stat = DftiComputeForward(desc, a(:,1))
        stat = DftiFreeDescriptor(desc)

    end subroutine  

end program

However, the MKL Dfti* routines need to be explicitly told the array strides. Looking through reference manuals I have not found any intrinsic functions which return stride information. A couple of interesting resources are here and here which discuss whether array sections are copied and how Intel Fortran handles arrays internally. I would rather not restrict myself to the way that Intel currently uses its array descriptors.

How can I figure out the stride information? Note that in general I would want the fft routine (or any similar routine) to not require any additional information about the array to be passed in.

EDIT:

I have verified that an array temporary is not created in this scenario, here is a simpler piece of code which I have checked on Intel(R) Visual Fortran Compiler XE 14.0.2.176 [Intel(R) 64], with optimizations disabled and heap arrays set to 0.

program main
    implicit none

    real(8),allocatable::a(:,:)

    pause

    allocate(a(8192,8192))

    pause

    call random_number(a)

    pause

    call foo(a(:4096,:4096))

    pause

    contains

    subroutine foo(a)
        implicit none

        real(8)::a(:,:)

        open(unit=16, file='a_sum.txt')

        write(16, *) sum(a)

        close(16)

    end subroutine

end program

Monitoring the memory usage, it is clear that an array temporary is never created.

EDIT 2:

module m_foo
    implicit none

contains

    subroutine foo(a)
        implicit none

        real(8),contiguous::a(:,:)

        integer::i, j

        open(unit=16, file='a_sum.txt')

        write(16, *) sum(a)

        close(16)        

        call nointerface(a)

    end subroutine

end module

subroutine nointerface(a)
    implicit none

    real(8)::a(*)

end subroutine

program main
    use m_foo

    implicit none

    integer,parameter::N = 8192
    real(8),allocatable::a(:,:)

    integer::i, j
    real(8)::count

    pause

    allocate(a(N, N))

    pause

    call random_number(a)

    pause

    call foo(a(:N/2,:N/2))

    pause

end program

EDIT 3:

The example illustrates what I'm trying to achieve. There is a 16x16 contiguous array, but I only want to transform the upper 4x4 array. The first call simply passes in the array section, but it doesn't return a single one in the upper left corner of the array. The second call sets the appropriate stride and a subsequently contains the correct upper 4x4 array. The stride of the upper 4x4 array with respect to the full 16x16 array is not one.

program main
    implicit none

    complex(8),allocatable::a(:,:)

    allocate(a(16,16))

    a = 0.0d0
    a(1:4,1:4) = 1.0d0

    call fft(a(1:4,1:4))

    write(*,*) a(1:4,1:4)

    pause

    a = 0.0d0
    a(1:4,1:4) = 1.0d0

    call fft_stride(a(1:4,1:4), 1, 16)

    write(*,*) a(1:4,1:4)

    pause

    contains

    subroutine fft(a)  !{{{
        use mkl_dfti

        implicit none

        complex(8),intent(inout)::a(:,:)

        type(dfti_descriptor),pointer::desc
        integer::stat

        stat = DftiCreateDescriptor(desc, DFTI_DOUBLE, DFTI_COMPLEX, 2, shape(a) )
        stat = DftiCommitDescriptor(desc)
        stat = DftiComputeForward(desc, a(:,1))
        stat = DftiFreeDescriptor(desc)

    end subroutine  !}}}

    subroutine fft_stride(a, s1, s2)  !{{{
        use mkl_dfti

        implicit none


        complex(8),intent(inout)::a(:,:)
        integer::s1, s2

        type(dfti_descriptor),pointer::desc
        integer::stat

        integer::strides(3)

        strides = [0, s1, s2]

        stat = DftiCreateDescriptor(desc, DFTI_DOUBLE, DFTI_COMPLEX, 2, shape(a) )
        stat = DftiSetValue(desc, DFTI_INPUT_STRIDES, strides)
        stat = DftiCommitDescriptor(desc)
        stat = DftiComputeForward(desc, a(:,1))
        stat = DftiFreeDescriptor(desc)

    end subroutine  !}}}  

end program

I suspect that when the MKL FFT routines talk about stride, what they really care about the data type and type of FFT you are performing, i.e. if you are performing an ND FFT on real or complex data. — Yossarian, Apr 11 '14 at 08:24
according to the docs, the `DftiCreateDescriptor` will use a default assumption of no padding, which is what will happen if you pass the assumed-shape eventually to the low-level routine. If you want to pass a piece of `array`, then passing the whole array and setting the strides will be more efficient than passing an array slice, which will create an array temporary if it is non-contiguous. — steabert, Apr 11 '14 at 09:05
Are you sure it creates an array temporary? I thought if there is an explicit interface there is no temporary: http://nf.nci.org.au/facilities/software/FORTRAN/Intel10/doc/main_for/mergedProjects/optaps_for/fortran/optaps_prg_arrs_f.htm See right down the bottom. — bdforbes, Apr 11 '14 at 09:07
@Yossarian actually they really do allow for the input and output memory to be laid out in a very general way, provided there are regular strides. I have confirmed this using the C interfaces. — bdforbes, Apr 11 '14 at 09:16
Fortran in general says little about how an array is laid out, and striding is an implementation detail. If you want to see a temporary in your edited example, use the `contiguous` attribute. — francescalus, Apr 11 '14 at 10:03
@francescalus I applied the `contiguous` attribute in `foo` (now placed in a module however) and it still doesn't seem to create an array temporary based on memory usage. That confuses me a bit. Is the compiler being too clever based on what the routine does? — bdforbes, Apr 11 '14 at 10:08
Ah yes I think the compiler was being clever... I created a subroutine with no explicit interface and called it from `foo`, and an array temporary was created. See EDIT 2. — bdforbes, Apr 11 '14 at 10:14
My comment was apprarently [not suitable](http://software.intel.com/en-us/forums/topic/508693) for ifort. — francescalus, Apr 11 '14 at 10:29
In that thread, Steve Lionel mentioned "...a contiguous copy is passed, much like when you pass a noncontiguous array section to an assumed-size array - and any changes are copied back". Does he mean that to imply that even without the `CONTIGUOUS` attribute I should have been expecting an array copy with my previous examples? — bdforbes, Apr 11 '14 at 10:35
Only for the _assumed-size_ array, which is indeed what you have in your EDIT 2's `nointerface`. For _assumed-shape_ (`foo`) it needn't be the case. But I suspect these are a diversion from your main question (about which I haven't thought). — francescalus, Apr 11 '14 at 10:44
Okay that clears that up, so in my case an array descriptor will be passed through with stride information. Now I just need to access that information! — bdforbes, Apr 11 '14 at 10:47
@bdforbes the array temporary would be created once you enter the mkl routine, certainly not in your code, as your dummy argument is an assumed shape. — steabert, Apr 11 '14 at 11:53

score 0 · Answer 1 · answered Apr 11 '14 at 11:41

0

Tho routine DftiComputeForward accepts an assumed size array. If you pass something complicated and non-contiguous, a copy will have to be made at passing. The compiler can check at run-time if the copy is actually necessary or not. In any case for you the stride is always 1, because that will be the stride the MKL routine will see.

In your case you pass A(:,something), this is a contiguous section, provided A is contiguous. If A is not contiguous a copy will have to be made. Stride is always 1.

answered Apr 11 '14 at 11:41

Vladimir F Героям слава

57,977
4
76
119

Why do the `DFTI_INPUT_STRIDES` and `DFTI_OUTPUT_STRIDES` configuration parameters exist then? There must be some way to utilise them. I've done it using the C interfaces. – bdforbes Apr 11 '14 at 11:44
Because you may pass a pointer to a column from a C 2D array (actually 1D n x m array). The column is non-contiguous. C arrays are much more simple, just pointers. C won't copy any contiguous column or row for you. – Vladimir F Героям слава Apr 11 '14 at 11:47
Note you are passing a 1D array. – Vladimir F Героям слава Apr 11 '14 at 11:49
From what I can tell from the [documentation](http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/GUID-2C619B0B-84D2-4CF0-9E5A-FFB4287A1BFB.htm) they are mostly for when you are doing something like an in-place inverse real FFT, where the lower half of the last dimension of your array is the positive wavenumbers and the upper half are the negative wavenumbers (going backwards from the end of the array). More explicitly, see their example: `[0, 1, 2*(n1/2+1), 2*(n1/2+1)*n2, … ]` – Yossarian Apr 11 '14 at 11:50
Also if you want to do real transform for complex data with zero imaginary part without a copy. – Vladimir F Героям слава Apr 11 '14 at 11:52
I've used the strides options when transforming arbitrary sections of an array from C (actually Python). I think it's definitely useful. See my post here: http://stackoverflow.com/questions/11752898/threaded-fft-in-enthought-python/22951787#22951787 – bdforbes Apr 11 '14 at 11:52
C is not Fortran. But you can pass just the location of one element of your 2D array and use the C like approach. – Vladimir F Героям слава Apr 11 '14 at 11:53
1

@bdforbes you can actually pass down the array name which will not cause the creation of a temporary, but in that case you'll have to set the strides if you wnat the mkl routines to use a portion of that array. – steabert Apr 11 '14 at 11:54
In the MKL examples they perform 2D FFT using `status = DftiComputeForward(hand, x(:,1))`, I'll try it just using `status = DftiComputeForward(hand, x)`. But then still the problem remains of figuring out the strides... – bdforbes Apr 11 '14 at 11:55
In that case, are they not just `ubound(x)`? Assuming `x` is the dummy argument. – Yossarian Apr 11 '14 at 11:57
Unfortunately not, the `lbound` and `ubound` reflect the section independently of the bigger array. – bdforbes Apr 11 '14 at 11:59
The stride is one, we all tell you so. – Vladimir F Героям слава Apr 11 '14 at 12:14
But you don't care about the bigger array, surely? You care about the section you are passing to `fft()`. That is the array you are then passing to the MKL routines. Fortran is very good at handling arrays. You will only need to set `DFTI_INPUT_STRIDES` if you're then doing an inverse ND real FFT or somesuch. – Yossarian Apr 11 '14 at 12:22
I've constructed a simple example with a 16x16 array of complex numbers. I set them initially all to zeroes, then set the section (1:4,1:4) to ones. I try performing an FFT by simply passing in the section, but it does not return the correct result. If I set the input strides to [0, 1, 16], then it works correctly. I don't think Fortran automatically handles this on its own. – bdforbes Apr 11 '14 at 12:31
See EDIT 3 in my question. In order to achieve the result I want, I _must_ set the input strides appropriately. – bdforbes Apr 11 '14 at 12:45
There will be a copy of a(:,1) in that case, stride is 1. – Vladimir F Героям слава Apr 11 '14 at 12:48
1

@bdforbes It may help reduce confusion between the dummy `a` and the actual `a` by renaming the argument to your subroutines `b` (or such). That would make it easier for us to clarify the association between the two arrays. – francescalus Apr 11 '14 at 13:09
@VladmirF there is no array copy, otherwise a runtime warning would indicate it as I have enabled that option. The stride is only one along the first dimension, it is 16 along the second dimension as opposed to 4 as implied by the dimensions passed into CreateDescriptor. Note that this is a 2D FFT. – bdforbes Apr 11 '14 at 13:29
If the 4x4 section were copied into a contiguous memory block it would have returned the correct answer, but it did not, it only did after correctly specifying the array stride (1, 16). – bdforbes Apr 11 '14 at 13:32
@bdforbes: in your EDIT3, you're not passing `a`, `a(:,:)` or `a(1:4,1:4)` to the fft routine, you are passing `a(:,1)`? – steabert Apr 11 '14 at 13:44
That is how the MKL examples pass the arrays in. – bdforbes Apr 11 '14 at 15:40

score 0 · Accepted Answer · answered Apr 11 '14 at 14:40

I'm guessing you get confused because you worked around the explicit interface of the MKL function DftiComputeForward by giving it a(:,1). This is contiguous and doesn't need an array temporary. It's wrong, however, the low-level routine will get the whole array and that's why you see that it works if you specify strides. Since the DftiComputeForward exects an array complex(kind), intent inout :: a(*), you can work by passing it through an external subroutine.

program ...
call fft(4,4,a(1:4,1:4))
end program

subroutine fft(m,n,a)  !{{{
use mkl_dfti

implicit none

complex(8),intent(inout)::a(*)
integer :: m, n

type(dfti_descriptor),pointer::desc
integer::stat

stat = DftiCreateDescriptor(desc, DFTI_DOUBLE, DFTI_COMPLEX, 2, (/m,n/) )
stat = DftiCommitDescriptor(desc)
stat = DftiComputeForward(desc, a)
stat = DftiFreeDescriptor(desc)

end subroutine !}}}

This will create an array temporary though when going into the subroutine. A more efficient solution is then indeed strides:

program ...
call fft_strided(4,4,a,16)
end program

subroutine fft_strided(m,n,a,lda)  !{{{
use mkl_dfti

implicit none

complex(8),intent(inout)::a(*)
integer :: m, n, lda

type(dfti_descriptor),pointer::desc
integer::stat

integer::strides(3)

strides = [0, 1, lda]

stat = DftiCreateDescriptor(desc, DFTI_DOUBLE, DFTI_COMPLEX, 2, (/m,n/) )
stat = DftiSetValue(desc, DFTI_INPUT_STRIDES, strides)
stat = DftiCommitDescriptor(desc)
stat = DftiComputeForward(desc, a)
stat = DftiFreeDescriptor(desc)

end subroutine !}}}

Thanks for summarising it, I could have been clearer before. I definitely do _not_ want to ever copy arrays when it is unnecessary. I suppose that unless someone had a solution for determining the strides, I will just have to be explicit as in your example above. — bdforbes, Apr 11 '14 at 15:42
@bdforbes: if you want to operate on part of the array without a temporary copy, you have to use strides, that's what they're for. — steabert, Apr 11 '14 at 15:57
I was well aware of that when I posted my question, but it seemed to take a while to convince everyone else! — bdforbes, Apr 11 '14 at 23:03

score 0 · Answer 3 · answered Jun 30 '14 at 09:53

Some of the answers here do not understand the different between fortran strides and memory strides (though they are related).

To answer your question for future readers beyond the specific case you have here - there does not appear to be away to find an array stride solely in fortran, but it can be done via C using inter-operability features in newer compilers.

You can do this in C:

#include "stdio.h"
size_t c_compute_stride(int * x, int * y)
{
    size_t px = (size_t) x;
    size_t py = (size_t) y;
    size_t d = py-px;
    return d;
}

and then call this function from fortran on the first two elements of an array, e.g.:

program main
    use iso_c_binding
    implicit none

    interface
        function c_compute_stride(x, y) bind(C, name="c_compute_stride")
            use iso_c_binding
            integer :: x, y
            integer(c_size_t) :: c_compute_stride
        end function
    end interface

    integer, dimension(10) :: a
    integer, dimension(10,10) :: b

    write(*,*) find_stride(a)
    write(*,*) find_stride(b(:,1))
    write(*,*) find_stride(b(1,:))

    contains

    function find_stride(x)
        integer, dimension(:) :: x
        integer(c_size_t) :: find_stride
        find_stride = c_compute_stride(x(1), x(2))
    end function

end program

This will print out:

                4
                4
               40

Thanks, this is good. It can be done in Fortran too via the `loc` intrinsic although that is typically a compiler extension. — bdforbes, Jun 30 '14 at 22:20

score -1 · Answer 4 · answered Apr 11 '14 at 07:30

In short: assumed-shape arrays always have stride 1.

A bit longer: When you pass a section of an array to a subroutine which takes an assumed-shape array, as you have here, then the subroutine doesn't know anything about the original size of the array. If you look at the upper- and lower-bounds of the dummy argument in the subroutine, you'll see they will always be the size of the array section and 1.

integer, dimension(10:20) :: array
integer :: i

array = [ (i, i=10,20) ]
call foo(array(10:20:2))

subroutine foo(a)
    integer, dimension(:) :: a
    integer :: i

    print*, lbound(a), ubound(a)
    do i=lbound(a,1), ubound(a,2)
        print*, a(i)
    end do

end subroutine foo

This gives the output:

1 6
10 12 14 16 18 20

So, even when your array indices start at 10, when you pass it (or a section of it), the subroutine thinks the indices start at 1. Similarly, it thinks the stride is 1. You can give a lower bound to the dummy argument:

integer, dimension(10:) :: a

which will make lbound(a) 10 and ubound(a) 15. But it's not possible to give an assumed-shape array a stride.

See my edit, an array temporary is _not_ created for the array section, which means stride information _must_ be passed around with the array. — bdforbes, Apr 11 '14 at 09:25

Determine assumed-shape array strides at runtime

4 Answers4