4

I am writing some relatively computationally intensive procedures in Fortran (2008) that require recursion and I came across some sources claiming that 'pass by value' may be several times faster than the standard 'pass by reference' (at least with the GNU compiler) for many recursive procedures: http://fortranwiki.org/fortran/show/recursion

My background is not computer science/engineering and I am having difficulty finding explanations for this. I'd like to make intelligent choices when the time comes to optimize my code, though this runs counter to my basic understanding of the speed of 'pass by value' vs 'pass by reference.' Is anyone able to offer some insight into this topic?

Thank you

  • This *really* calls for the actual claims. What are the actual claims? What kind of recursive codes are actually supposed to be faster? Did you make any performance tests? Where did you read those claims? – Vladimir F Героям слава Dec 15 '20 at 19:05
  • Be aware, before making any code changes, that Fortran procedures, that are not `bind(C)` pass even their `value` arguments by reference. They just pass the reference to a copy. – Vladimir F Героям слава Dec 15 '20 at 19:06
  • See also https://stackoverflow.com/questions/26552481/which-is-faster-pass-by-reference-vs-pass-by-value-c We really need the *actual* claim and best the actual code. – Vladimir F Героям слава Dec 15 '20 at 19:09
  • Looks like a case of premature optimization. Write the code to be clear to reader and let the compiler worry about the details. Then, after it works, use a profiler to find hotspots. – evets Dec 15 '20 at 20:04
  • 1
    @VladimirF "Fortran procedures, that are not bind(C) pass even their value arguments by reference" - as I understand it that's not required by the standard. Further with gfortran-10 on my machine including an interface in the hyper-linked code and then adding the value attribute reduces the run time from 0.99s to 0.55. Now this may be due to an optimisation enabled by the value attribute rather than the attribute itself (I don't know, hence no answer yet) but it does show something different is happening in the two cases. – Ian Bush Dec 15 '20 at 20:12
  • @IanBush I do not know what exactly does the standard require but it was Steven Lionel who said here at SO or elsewhere that Intel had it wrong and they changed it to passing a pointer to a copy to comply. I hope I am not misrepresenting it. But it is true that it is strange for Fortran to to specify the details of the passing mechanism. I can only imagine some requirement requires that indirectly (sequence association?). – Vladimir F Героям слава Dec 15 '20 at 20:46
  • @francescalus I think it was about all variables. – Vladimir F Героям слава Dec 15 '20 at 21:58
  • Well playing with gfortran-10 and -fopt-info it appears that if you give the arguments the value attribute the compiler decides it can do an extra level of inlining. This sounds a more reasonable explanation of the increase in performance. But quite why value is required for this, and why intent( in ) is not enough, and what flag can force the extra inlining with intent( in ) is beyond me. – Ian Bush Dec 15 '20 at 22:05
  • @VladimirF Perhaps this is the comment by Lionel you mentioned: https://community.intel.com/t5/Intel-Fortran-Compiler/Does-VALUE-Fortran-2003-introduction-attribute-for-a-type-make/td-p/945754 – Jonatan Öström Dec 16 '20 at 16:00
  • @JonatanÖström Not sure if it was directly this one but the content is what I had in mind. – Vladimir F Героям слава Dec 16 '20 at 16:25
  • And the possibility of `optional` is a good point, that is implemented by the possibility of a null pointer. I am still not sure if the same implementation is required for non-optional arguments. I really did not study the details and I do not have much time to do that. – Vladimir F Героям слава Dec 16 '20 at 16:28
  • I asked about this on comp.lang.fortran - Thomas Koenig, a gfortran developer says "It is indeed a possible choice for a compiler to pass an argument via the C passing conventions ... Gfortran does indeed use a C-like argument passing convention for VALUE arguments (including the hidden arguments for otional arguments). One advantage is that this saves one pointer dereference if the value is indeed passed in a register, which can lead to speed advantages." – Ian Bush Dec 16 '20 at 18:41
  • I should also add Steve Lionel does also reiterate what Valdimir says above. But to be honest quite how the argument passing occurs is not really what I think is the point of interest, it is why the Value attribute enables extra levels of optimisation. – Ian Bush Dec 16 '20 at 18:49
  • @IanBush Quite the opposite, it is of utmost interest because that is why the optimisation is possible. – Vladimir F Героям слава Dec 17 '20 at 10:55

1 Answers1

4

An attempt at an answer - I can't claim to understand all that is going on here but thought I would report what I have found.

First we have to make some assumptions about how arguments are passed in Fortran with and without the value attribute. This will be implementation dependent, but as the question mentions gfortran I'll concentrate on that. In comp.lang.fortran Thomas Koenig, a gfortran developer says

"Since the example is for gfortran, maybe I can add a little here.

It is indeed a possible choice for a compiler to pass an argument via the C passing conventions, which effectively means that the temporary copy in question is made in a register or on the stack. For a sufficiently small number of arguments, most ABIs will use registers.

This method does not work as such with OPTIONAL VALUE arguments, but it is possible to get around that with hidden arguments which indicate the presence of absence of the optional argument.

Gfortran does indeed use a C-like argument passing convention for VALUE arguments (including the hidden arguments for otional arguments). One advantage is that this saves one pointer dereference if the value is indeed passed in a register, which can lead to speed advantages."

So I'm going to assume that the default argument passing method is by reference, and as described above when the value attribute is used.

For compilation I shall use

ian@eris:~/work/stack$ gfortran-10 --version
GNU Fortran (GCC) 10.0.1 20200225 (experimental)
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

The codes I will look at are as follows. First the one that uses default argument passing:

ian@eris:~/work/stack$ cat ack_default.f90
Program ackermann
  Interface
     Recursive Function ack( m, n ) Result( a )
       Integer, Intent(in) :: m
       Integer, Intent(in) :: n
       Integer :: a
     End Function ack
  End Interface
  Integer :: start, finish, rate
  Call system_Clock( start, rate )
  Write(*,*) ack(3, 12)
  Call system_Clock( finish, rate )
  Write( *, * ) 'Time: ', Real( finish - start ) / rate

End Program ackermann

Recursive Function ack(m, n) Result(a)
  Integer, Intent(in) :: m
  Integer, Intent(in) :: n
  Integer :: a

  If (m == 0) Then
     a=n+1
  Else If (n == 0) Then
     a=ack(m-1,1)
  Else
     a=ack(m-1, ack(m, n-1))
  End If
End Function ack

And next the value version:

Program ackermann
  Interface
     Recursive Function ack( m, n ) Result( a )
       Integer, Intent(in), Value :: m
       Integer, Intent(in), Value :: n
       Integer :: a
     End Function ack
  End Interface
  Integer :: start, finish, rate
  Call system_Clock( start, rate )
  Write(*,*) ack(3, 12)
  Call system_Clock( finish, rate )
  Write( *, * ) 'Time: ', Real( finish - start ) / rate

End Program ackermann

Recursive Function ack(m, n) Result(a)
  Integer, Intent(in), Value :: m
  Integer, Intent(in), Value :: n
  Integer :: a

  If (m == 0) Then
     a=n+1
  Else If (n == 0) Then
     a=ack(m-1,1)
  Else
     a=ack(m-1, ack(m, n-1))
  End If
End Function ack

It can be seen that the only difference is the value attribute on the arguments. Compiling both and comparing I get:

ian@eris:~/work/stack$ gfortran-10 -O3 -Wall -Wextra -std=f2008 ack_default.f90
ian@eris:~/work/stack$ ./a.out
       32765
 Time:    1.01900005    
ian@eris:~/work/stack$ gfortran-10 -O3 -Wall -Wextra -std=f2008 ack_value.f90
ian@eris:~/work/stack$ ./a.out
       32765
 Time:   0.602999985    

So the value version is appreciably quicker than that in which the arguments are passed by the default mechanism.

Asking for an optimisation report from gfortran gives the following:

ian@eris:~/work/stack$ gfortran-10 -O3 -Wall -Wextra -std=f2008 -fopt-info ack_default.f90
ack_default.f90:27:0: optimized:  Inlined ack/13 into ack/0 which now has time 18.062500 and size 95, net change of +65.
ack_default.f90:11:0: optimized: basic block part vectorized using 16 byte vectors
ack_default.f90:13:0: optimized: basic block part vectorized using 16 byte vectors
ian@eris:~/work/stack$ gfortran-10 -O3 -Wall -Wextra -std=f2008 -fopt-info ack_value.f90
ack_value.f90:11:0: optimized:  Inlined ack.constprop/12 into ackermann/1 which now has time 174.107273 and size 60, net change of -7.
ack_value.f90:27:0: optimized:  Inlined ack/14 into ack/0 which now has time 455.794475 and size 79, net change of +64.
ack_value.f90:11:0: optimized: basic block part vectorized using 16 byte vectors
ack_value.f90:13:0: optimized: basic block part vectorized using 16 byte vectors

Thus it appears that the value code has an extra level of inlining applied, and this was my first thought at an answer. However turning off inlining gives

ian@eris:~/work/stack$ gfortran-10 -O3 -Wall -Wextra -std=f2008 -fno-inline ack_default.f90
ian@eris:~/work/stack$ ./a.out
       32765
 Time:    1.46000004    
ian@eris:~/work/stack$ gfortran-10 -O3 -Wall -Wextra -std=f2008 -fno-inline ack_value.f90
ian@eris:~/work/stack$ ./a.out
       32765
 Time:   0.958999991    

so the value version is still much quicker than the default version - something else is going on.

Thomas Koenig also said:

With gfortran, it can also be instructive to inspect the output of -fdump-tree-original. 

So I took a look at that. First with default passing (and keeping only the relevant parts)

ian@eris:~/work/stack$ gfortran-10 -O3 -Wall -Wextra -std=f2008 -fdump-tree-original ack_default.f90
ian@eris:~/work/stack$ cat ack_default.f90.004t.original 
ack (integer(kind=4) & restrict m, integer(kind=4) & restrict n)
{
  integer(kind=4) a;

  if (*m == 0)
    {
      a = *n + 1;
    }
  else
    {
      if (*n == 0)
        {
          {
            integer(kind=4) D.3903;
            static integer(kind=4) C.3904 = 1;

            D.3903 = *m + -1;
            a = ack (&D.3903, &C.3904);
          }
        }
      else
        {
          {
            integer(kind=4) D.3905;
            integer(kind=4) D.3906;
            integer(kind=4) D.3907;

            D.3905 = *m + -1;
            D.3906 = *n + -1;
            D.3907 = ack ((integer(kind=4) *) m, &D.3906);
            a = ack (&D.3905, &D.3907);
          }
        }
      L.2:;
    }
  L.1:;
  return a;
}

And now for the value version

ian@eris:~/work/stack$ cat ack_value.f90.004t.original 
ack (integer(kind=4) m, integer(kind=4) n)
{
  integer(kind=4) a;

  if (m == 0)
    {
      a = n + 1;
    }
  else
    {
      if (n == 0)
        {
          a = ack (m + -1, 1);
        }
      else
        {
          a = ack (m + -1, ack (m, n + -1));
        }
      L.2:;
    }
  L.1:;
  return a;
}

It can be seen the value version is a lot simpler and is pretty much a transliteration of the code. However the default code has a lot more going on, in particular

      {
        integer(kind=4) D.3905;
        integer(kind=4) D.3906;
        integer(kind=4) D.3907;

        D.3905 = *m + -1;
        D.3906 = *n + -1;
        D.3907 = ack ((integer(kind=4) *) m, &D.3906);
        a = ack (&D.3905, &D.3907);
      }

Now I am not an expert here ... but that looks to me very much like the compiler setting up temporaries on the stack to hold the values of intermediate results, they can't overwrite the original, and in fact looks quite similar to what I would expect the compiler would have to do to implement passing by value. Thus it looks to me that

  • Because the compiler has to create new "variables" on the stack to hold the intermediate results when passing by reference in this case there will be no advantage gained by using that method
  • The compiler is better at optimising the standard "pass by value" method than a more generic "pass by reference and intermediate values". I really am beginning to guess now but I suspect it is how the compiler is using registers underlies the improved performance.

To go further we need somebody who reads x86 assembler. That's not me.

Ian Bush
  • 6,996
  • 1
  • 21
  • 27
  • 1
    Yes, it is exactly what I assumed, It is the very basic distinction between the C-like pass by value and the C-like pass by reference (pointer) as discussed at https://stackoverflow.com/questions/26552481/which-is-faster-pass-by-reference-vs-pass-by-value-c Because the value is given directly in a register one does not have to dereference the pointer to get the value of the argument. – Vladimir F Героям слава Dec 17 '20 at 10:48
  • I did observe that the values are passed by C-like value in gfortran using ``fdump-tree-original` as Thomas Koenig suggested. It seems that the developers of two major Fortran compilers disagree in what underhood mechanism the Fortran standard does or does not require. I somehow assumed that what Steven Lionel said would be true for other compilers as well. – Vladimir F Героям слава Dec 17 '20 at 10:51
  • Interesting answer. FWIW, out of curiosity, I've experimented a bit with different compiler/OS combinations and get these results. `ifort 19.1.0.166, Linux`: 1.264 (default) 1.266 (value), `gfortran 9.1.0`: 1.133 (default) 0.533 (value) and `nagfor 6.1, Windows`: 1.347 (default) 0.365 (value). So `gfortran` and `nagfor` seem to optimize the value version, but not `ifort`. – jbdv Dec 17 '20 at 10:51
  • Do not forget that addresses are 64-bit on modern computers and default precision numerical values are normally 32-bit. That can also play a role, although the cost of dereferencing is probably more important. – Vladimir F Героям слава Dec 17 '20 at 10:52
  • 1
    @jbdv The difference for Intel Fortran is exactly in the difference in the passing mechanism. Intel Fortran passes a copy of the value *by reference*. – Vladimir F Героям слава Dec 17 '20 at 10:53
  • @VladimirF, thanks for the comment, making a 3rd variant where the function has the bind(c) attribute, which according to my understanding your comment here and [Steve Lionel's comment on comp.lang.fortran](https://groups.google.com/g/comp.lang.fortran/c/1wUlHfcMR5k/m/adG_WA8bAQAJ) avoids the copy for true _pass by value_, I can confirm that the same speed up is observed for `ifort`, 0.521 (bind(c)). – jbdv Dec 17 '20 at 11:45
  • Sine this thread seems to have more or less reached its end (for now), I wanted to express my gratitude for the in-depth discussion. @IanBush, thank you for sharing your investigation process. This kind of answer helps users like myself develop more strategies for tackling performance and compiler related questions. – StillUsesFORTRAN Dec 18 '20 at 17:17