Parsing a complicated function using MKL/VML

Question

I am trying to calculate a fairly complicated function, say func() - involving several additions, substractions, multiplications, divisions and trigonometric functions, of several two-dimensional arrays in fortran. The calculation is massively parrallel, in that each func() is independent over its row and column location. Each of the matrices is many gigabytes in size, and there are about a dozen of them as arguments.

I would like to make use of Intel MKL functions (invoking --mkl-parallel), in particular VML functions to add, subtract, divide etc. My question is: how can I render a complicated functional expression such as,

e.g.: func(x,y,z) = x*y+cos(z*x-x) where x,y,z are 2d arrays of several GB

in terms of VML functions but using more familiar binary operators. You see my problem requires, in principle, converting all the binary operators, such as "+" and "*" into binary functions taking arguments as ?vadd(x,y). Of course this would be very cumbersome and unsightly for large expressions. Is there a way to overload the binary arithmetic operators such as "+","-" to preferentially use MKL/VML versions in fortran. An example would be nice! Thanks!

You can redefine these binary operators for custom datatypes. This means, in your case, I would define a datatype which contains only a real (or what is appropriate to you) and define the operations for this. I would advise against redefining the default operators, as this could cause some problems. — Stefan, Oct 23 '13 at 07:46
Thanks for the help, Stefan. Since I'm new to fortran I don't quite know how to realize this. Could you provide an example for the addition operation, say? — marital_weeping, Oct 23 '13 at 08:06
I believe that the standard excludes redefining an intrinsic operator, such as `+`, in such a way that the compiler would have problems in parsing an expression such as `A+B` in which the dummy arguments are compatible (in type, kind and rank) with those of the corresponding intrinsic function. Even if this is possible I agree with @Stefan that it is probably not a design decision to make lightly. — High Performance Mark, Oct 23 '13 at 08:46

kangshiyin · Answer 1 · 2013-10-23T11:56:46.350

1

I know this answer is a little bit off-topic.

Since all the operations are element-wise and your operations are simple, the func() could be a memory bandwidth bounded task. In this case, using VML may not be a good choice to maximum the performance.

Suppose each of your arrays is of 10GB in size, uisng VML as follows will need at least 9 x 10GB reading and 5 x 10GB writing.

func(...) {
    tmp1=x*z
    tmp1=tmp1-x;
    tmp1=cos(tmp1);
    tmp2=x*y;
    return tmp1+tmp2;
}

where all the operations all overloaded for 2d array.

Instead you may find the following approach has much less memory access (3 x 10GB reading and 1 x 10GB writing) thus could be quicker (pseudo code).

$omp parallel for
for i in 1 to m
    for j in 1 to n
        result(i,j)= x(i,j)*y(i,j)+cos(z(i,j)*x(i,j)-x(i,j));
    end
end

edited Oct 23 '13 at 11:56

answered Oct 23 '13 at 11:51

kangshiyin

9,681
1
17
29

That's very insightful and good to know since your approach is what I had to begin with. – marital_weeping Oct 23 '13 at 11:58
Reflecting on what you wrote, I was wondering if all the temp variables -that lead to additional reads and writes, can be gotten rid of by inlining (See @Stefan sample code) the MKL functions as operators? Something like this: x.mul.y .plus. cos(z .mul. x .minus. x). Or do the overheads just get concealed? – marital_weeping Oct 24 '13 at 14:17
Overloaded operators are only wrappers to call MKL functions, so they still need buffer to store temp results. C++ template expression can overload operators and eliminate temp buffer, but I think fortran cannot do that. http://en.wikipedia.org/wiki/Expression_templates – kangshiyin Oct 24 '13 at 14:45

score 0 · Answer 2 · answered Oct 23 '13 at 13:26

I developped a small example to show the addition of two vectors. As I don't have MKL installed anymore, I used the SAXPY command from BLAS. The principle should be the same.

At first you define a module with the appropriate definitions. In my case this would be an assignment to save a real array in my datatype (this is only a convenience function as you could also directly access the array variable) and the definition of the addition. Both are a new overload to the + operator and = assignment.

In the program, I define three fields. Two of them get assigned with random numbers and then added to get the third field. Then the first two fields get stored in my special variables, and the result of this addition is stored in a third variable of this type.

Finally, the result is compared by accessing the array directly. Please note, that the assignment from custom datatype to the same datatype is already defined (e.g. ffield3 = ffield1 is already defined.)

My module:

MODULE fasttype

    IMPLICIT NONE

    PRIVATE

    PUBLIC :: OPERATOR(+), ASSIGNMENT(=)

    TYPE,PUBLIC :: fastreal
        REAL,DIMENSION(:),ALLOCATABLE :: array
    END TYPE

    INTERFACE OPERATOR(+)
        MODULE PROCEDURE fast_add
    END INTERFACE

    INTERFACE ASSIGNMENT(=)
        MODULE PROCEDURE fast_assign
    END INTERFACE

CONTAINS

    FUNCTION fast_add(fr1, fr2) RESULT(fr3)
        TYPE(FASTREAL), INTENT(IN) :: fr1, fr2
        TYPE(FASTREAL) :: fr3
        INTEGER :: L

        L = SIZE(fr2%array)
        fr3 = fr2       

        CALL SAXPY(L, 1., fr1%array, 1, fr3%array, 1)
    END FUNCTION

    SUBROUTINE fast_assign(fr1, r2)
        TYPE(FASTREAL), INTENT(OUT) :: fr1
        REAL, DIMENSION(:), INTENT(IN) :: r2
        INTEGER :: L

        IF (.NOT. ALLOCATED(fr1%array)) THEN
            L = SIZE(r2)
            ALLOCATE(fr1%array(L))
        END IF

        fr1%array = r2
    END SUBROUTINE
END MODULE

My program:

PROGRAM main
    USE fasttype
    IMPLICIT NONE

    REAL, DIMENSION(:), ALLOCATABLE :: field1, field2, field3
    TYPE(fastreal) :: ffield1, ffield2, ffield3

    ALLOCATE(field1(10),field2(10),field3(10))
    CALL RANDOM_NUMBER(field1)
    CALL RANDOM_NUMBER(field2)

    field3 = field1 + field2

    ffield1 = field1
    ffield2 = field2

    ffield3 = ffield1 + ffield2

    WRITE(*,*) field3 == ffield3%array
END PROGRAM

Great! I really appreciate it. – marital_weeping Oct 23 '13 at 16:23 — marital_weeping, Oct 23 '13 at 16:23

Parsing a complicated function using MKL/VML

2 Answers2