Difference between local allocatable and automatic arrays

Question

I am interested in the difference between alloc_array and automatic_array in the following extract:

subroutine mysub(n)
integer, intent(in)  :: n
integer              :: automatic_array(n)
integer, allocatable :: alloc_array(:)

allocate(alloc_array(n))
...[code]...

I am familiar enough with the basics of allocation (not so much on advanced techniques) to know that allocation allows you to change the size of the array in the middle of the code (as pointed out in this question), but I'm interested in considering the case where you don't need to change the size of the array; they might be passed onto other subroutines for operation, but the only purpose of both variables in the code and any subroutine is to hold the data of an array of dimension n (and maybe change the data, but not the size).

(1) Is there any difference in memory usage? I am not an expert in low level procedures, but I have a very slight knowledge of how they matter and how they can impact on the higher level programming (kind of experience I'm talkng about: once trying to run a big code in fortran I was getting a mistake I didn't understand, sysadmin told me "oh, yeah, you are probably saturating the stack; try adding this line in your running script"; anything that gives me insight into how to consider this things when actually coding and not having to patch them later is welcomed). I've been told by people that it might be dependent on many other things like compiler or architecture, but I interpreted from those responses that they were not completely sure of exactly how this was so. Is it so absolutely dependant on a multitude of factors or is there a default/intended behavior in the coding that may then be over-riden by optional compiling keywords or system preferences?

(2) Would the subroutines have different interface needs? Again, not an expert, but it had happened to me before that because of the way I declare variables of subroutine, I end up having to put the subroutines in a module. I've been given to understand this may vary depending on whether I use things that are special for allocatable variables. I am thinking about the case in which everything I do with the variables can be done both by allocatables and automatics, not intentionally using anything specific of allocatables (other than allocation before usage, that is).

Finally, in case this is of use: the reason I am asking is because we are developing in a group and we have recently noticed different people use those two declarations in different ways and we needed to determine if this is something that can be left to personal preference or if there might be any reasons why it might be a good idea to set a clear criteria (and how to set that criteria). I don't need extremely detailed answers, I am trying to determine if this is something I should be doing research about to be careful on how we use it and in what aspects of it should the research be directed.

Though I would be interested to know of "interesting tricks" than can be done with allocation but are not directly related to the need of having size variability, I am leaving those for a possible future follow-up question and focusing here on the strictly functional differences (meaning: what I am explicitly telling compilers to do with my code). The two items I mentioned are the thing I could come up with due to previous experiences, but any other important one that I am missing and should consider, please do mention them.

I partly disagree with the duplicate-closing, but not enough to vote to re-open. The reason for the first is the explicit query about interfaces which isn't cleanly covered in the linked question. However, there are lots of other aspects which are perhaps too broad for a good answer. — francescalus, Jul 15 '15 at 14:35
@francescalus I don't understand. I am asking about the practical differences in usage between apparently identical options. Is this not the place to ask for best practice questions? They are usually broad, but I have seen them before. What is wrong with my question? Can I appeal in some way to the closing? — Nordico, Jul 15 '15 at 14:40
I agree that my answer to the other question isn't a valid/complete answer to this question. "Best practice" is likely to be closed as being opinion; the practical differences will often depend far too much on compilers/options/hardware (recall that Fortran is totally agnostic to all of that) for a good answer to be written; "faster" is a "it really does depend..." answer; "how the variable is used" can be much too broad. If you can edit the question to really focus on one specific area with examples of use then I think it should be re-opened. — francescalus, Jul 15 '15 at 14:50
Perhaps if I reformulated the question as "what does the compiler do different in each case and what are the immediate practical consequences of that?". If more specificity is needed because of compiler vary too much on this, I can specifically ask for gfortran and ifort, but I don't understand why not knowing that it was highly compiler-dependant couldn't be part of the original doubt... — Nordico, Jul 15 '15 at 15:45
A question like that can be valuable. In some aspects/cases the behaviour of the compiler will be defined by the Fortran language. In others it won't, and if the question is closed as too broad (by those who understand such things more than I) you'll just be prompted to refine the scope of the question without being massively penalized. [Indeed, discussing the extent of the language specification may well be useful - thinking in terms of portability, etc.] — francescalus, Jul 15 '15 at 15:56
I guess @Nordic is asking internal memory usage by the compiler so the question seems very different (though it may be too broad). As for memory usage, I guess recent compilers will allocate memory on stack in the first example, while in the second example compilers will probably allocate memory on heap (though some compilers may allocate it on stack depending on the array size...) Also, ifort can change the behavior by -heap-arrays option so the speed may vary by options. Anyway, for relatively small arrays, is the first option generally faster...?? — roygvib, Jul 15 '15 at 16:48
In my case I usually prefer option 1 for relatively small arrays, while use option 2 for large arrays (because they cannot be in stack...). RE interface requirements, as long as the passed argument is only n, there should be no difference (e.g., both can be defined even with implicit interface). But if there are more arguments, I guess the situation may be different; for example, if one wants to allocate a local polymorphic object by referring to the (dynamic) type of a dummy argument, we may have only option 2 (<-- I'm not sure at all about this...) — roygvib, Jul 15 '15 at 17:12
@roygvib Yes and no: local variables don't affect the requirements around interfaces (that's the yes). The interface requirement is on the _caller_ not the procedure, so what the procedure does has no impact (that's the no). [For this latter, note that the declared/dynamic type of the dummy argument will be used, not that of the actual argument.] — francescalus, Jul 15 '15 at 17:12
@francescalus RE the "no" part of Yes and no, I was thinking about whether there is a situation where only allocate() can do the job (or alternatively, a simple declaration statement with explicit n cannot do the job). But because I have little experience with dynamic types, I cannot imagine further ... >< And sure, the caller should be able to use the same interface. — roygvib, Jul 15 '15 at 17:24
@roygvib Ah, misread the "no" part as being continuing as interface. You are correct, then: for the object to be automatic type parameters/array bounds must come from specification expressions. If the dependence cannot be framed based on the dummy arguments (etc.) as a specification expression one is left with allocate. And, of course, if type itself is dynamic based on an argument then allocation is natural. — francescalus, Jul 15 '15 at 17:32
This question, and its closing as a duplicate, are discussed [on Meta](http://meta.stackoverflow.com/questions/299424/re-open-request). — francescalus, Jul 15 '15 at 22:32
In the case that you can identically use either approach (i.e. you don't need any of the extra capability that the allocatable offers) there is nothing that requires any difference in the underlying implementation (after all - the required externally observable result is the same). Typical implementation is for the storage for allocatables to be "on the heap", while storage for automatic variables may be heap or stack, but the implications of that typical implementation depend on many things. — IanH, Jul 16 '15 at 02:03
@IanH The thing is that the only extra capability offered by allocatables that I can think of is the possibility to change it's size, or is there something else I'm missing? — Nordico, Jul 16 '15 at 02:08
You can test for allocation success. The object can have/ be given a state of "deallocated". The object can be associated with allocatable dummy arguments. The allocation can be moved to another object. Size and length parameters can be general expressions. A local allocatable object can be polymorphic. You can precisely control when the object is finalized. — IanH, Jul 16 '15 at 02:20
Re-opened after the lengthy meta discussion. Please provide answers which are sufficiently different from the answers to http://stackoverflow.com/questions/24337658/fortran-90-differences-in-declaring-allocatable-array — Vladimir F Героям слава, Jul 16 '15 at 16:34
@Nordico I strongly recommend you to change your language and use the standard terminology. That means what you called "explicit size" is an automatic array. "explicit size" has a very specific meaning in Fortran it only refers to dummy arguments (e.g. http://stackoverflow.com/questions/30620604/can-the-shape-of-an-array-in-an-interface-match-multiple-fixed-array-size). I changed this in the title myself, because it can potentially confuse people researching the same problem. — Vladimir F Героям слава, Jul 16 '15 at 17:20

francescalus · Accepted Answer · 2021-03-24T21:09:36.353

For the sake of clarity, I'll briefly mention terminology. The two arrays are both local variables and arrays of rank 1.

alloc_array is an allocatable array;
automatic_array is an explicit-shape automatic object.

Being local variables their scope is that of the procedure. Automatic arrays and unsaved allocatable arrays come to an end when execution of the procedure completes (with the allocatable array being deallocated); automatic objects cannot be saved and saved allocatable objects are not deallocated on completion of execution.

Again, as in the linked question, after the allocation statement both arrays are of size n. These are still two very different things. Of course, the allocatable array can have its allocation status changed and its allocation moved. I'll leave both of those (mostly) out of the scope of this answer. An allocatable array, of course, doesn't have to have these things changed once it's been allocated.

Memory usage

What was partly contentious about a previous revision of the question is how ill-defined the concept of memory usage is. Fortran, as a language definition, tells us that both arrays come to be the same size and they'll have the same storage layout, and are both contiguous. Beyond that, much follows terms you'll hear a lot: implementation specific and processor dependent.

In a comment you expressed interest in ifort. So that I don't wander too far, I'll stick to that one compiler. Other compilers have similar concepts, albeit with different names and options.

Often, ifort will place automatic objects and array temporaries onto stack. There is a (default) compiler option -no-heap-arrays described as having effect

The compiler puts automatic arrays and temporary arrays in the stack storage area.

Using the alternative option -heap-arrays allows one to control that slightly:

This option puts automatic arrays and arrays created for temporary computations on the heap instead of the stack.

There is a possibility to control size thresholds for which heap/stack would be chosen (when that is known at compile-time):

If the compiler cannot determine the size at compile time, it always puts the automatic array on the heap.

As n isn't a constant, one would expect automatic_array to be on the heap with this option, regardless of the size specified. To determine the size, n, of the array at compile time, the compiler would potentially need to do quite a bit of code analysis, even if it is possible.

There's probably more to be said, but this answer would be far too long if I tried. One thing to note, however, is that automatic local objects and (post-Fortran 90) allocatable local objects can be expected not to leak memory.

Interface needs

There is nothing special about the interface requirements of the subroutine mysub: local variables have no impact on that. Any program unit calling that would be happy with an implicit interface. What you are asking about is how the two local arrays can be used.

This largely comes down to what uses the two arrays can be put to.

If the dummy argument of a second procedure has the allocatable attribute then only the allocatable array here can be passed to that procedure. It will also need to have an explicit interface. This is true whether or not the procedure changes the allocation.

Of course, both arrays could be passed as arguments to a dummy argument without the allocatable attribute and then we don't have different interface requirements.

Anyway, why would one want to pass an argument to an allocatable dummy when there will be no change in allocation status, etc.? There are good reasons:

there may be a code path in the procedure which does have an allocation change (controlled by a switch, say);
allocatable dummy arguments also pass bounds;
etc.,

This second one is more obvious if the subroutine had specification

subroutine mysub(n)
integer, intent(in)  :: n
integer              :: automatic_array(2:n+1)
integer, allocatable :: alloc_array(:)

allocate(alloc_array(2:n+1))

Finally, an automatic object has quite strict conditions on its size. n here is clearly allowed, but things don't have to be much more complicated before allocation is the only plausible way. Depending on how much one wants to play with block constructs.

Taking also a comment from IanH: if we have a very large n the automatic object is likely to lead to crash-and-burn. With the allocatable, one could use the stat= option to come to some amicable agreement with the compiler run-time.

Thanks, I know much of this you already said in other comments, but this is so much clearer this way. I didn't understand that last part though; `stat` option is a compiler option? — Nordico, Jul 20 '15 at 14:41
Sorry, but I have a second question that may be a bit tangential. Stack size for the program is decided at compile time? Do recursive functions/subroutines also get their variables allocated on the heap even if size is know beforehand or you can have problems with them overloading the stack? — Nordico, Jul 20 '15 at 14:53
`stat=` is to the `allocate` statement. Without it a problem leads to error termination, with it it's up to you to check the status (allocation still fails), but it allows graceful exit from the program, or an alternative route through. — francescalus, Jul 20 '15 at 16:55
The finer details of stack and recursion may be best asked at the vendor's forum. — francescalus, Jul 20 '15 at 16:56

roygvib · Answer 2 · 2015-07-21T17:31:09.323

Because gfortran or ifort + Linux(x86_64) are among the most popular combinations used for HPC, I made some performance comparison between local allocatable vs automatic arrays for these combinations. The CPU used is Xeon E5-2650 v2@2.60GHz, and the compilers are gfortran4.8.2 and ifort14.0. The test program is like the following.

In test.f90:

!------------------------------------------------------------------------           
subroutine use_automatic( n )
    integer :: n

    integer :: a( n )   !! local automatic array (with unknown size at compile-time)
    integer :: i

    do i = 1, n
        a( i ) = i
    enddo

    call sub( a )
end

!------------------------------------------------------------------------           
subroutine use_alloc( n )
    integer :: n

    integer, allocatable :: a( : )  !! local allocatable array                      
    integer :: i

    allocate( a( n ) )

    do i = 1, n
        a( i ) = i
    enddo

    call sub( a )

    deallocate( a )  !! not necessary for modern Fortran but for clarity                  
end

!------------------------------------------------------------------------           
program main
    implicit none
    integer :: i, nsizemax, nsize, nloop, foo
    common /dummy/ foo

    nloop = 10**7
    nsizemax = 10

    do i = 1, nloop
        nsize = mod( i, nsizemax ) + 1

        call use_automatic( nsize )
        ! call use_alloc( nsize )                                                   
    enddo

    print *, "foo = ", foo   !! to check if sub() is really called
end

In sub.f90:

!------------------------------------------------------------------------
subroutine sub( a )
    integer a( * )
    integer foo
    common /dummy/ foo

    foo = a( 1 )
ends

In the above program, I tried avoiding compiler optimization that eliminates a(:) itself (i.e., no operation) by placing sub() in a different file and making the interface implicit. First, I compiled the program using gfortran as

gfortran -O3 test.f90 sub.f90

and tested different values of nsizemax while keeping nloop = 10^7. The result is in the following table (time is in sec, measured several times by the time command).

nsizemax    use_automatic()    use_alloc()
10          0.30               0.31               # average result
50          0.48               0.47
500         1.0                0.90
5000        4.3                4.2
100000      75.6               75.7

So the overall timing seems almost the same for two calls when -O3 is used (but see Edit for different options). Next, I compiled with ifort as

[O3]  ifort -O3 test.f90 sub.f90
or
[O3h] ifort -O3 -heap-arrays test.f90 sub.f90

In the former case the automatic array is stored on the stack, while when -heap-arrays is attached the array is stored on the heap. The obtained result is

         use_automatic()    use_alloc()
         [O3]    [O3h]      [O3]    [O3h]
10       0.064   0.39       0.48    0.48
50       0.094   0.56       0.65    0.66
500      0.45    1.03       1.12    1.12
5000     3.8     4.4        4.4     4.4
100000   74.5    75.3       76.5    75.5

So for ifort, the use of automatic arrays seems beneficial when relatively small arrays are mainly used. On the other hand, gfortran -O3 shows no difference because both arrays are treated the same way (see Edit for more details).

Additional comparison:

Below is the result for Oracle Fortran compiler 12.4 for Linux (used with f90 -O3). The overall trend seems similar; automatic arrays are faster for small n, indicating the internal use of stack.

nsizemax    use_automatic()    use_alloc()
10          0.16               0.45
50          0.17               0.62
500         0.37               0.97
5000        2.04               2.67
100000      65.6               65.7

Edit

Thanks to Vladimir's comment, it has turned out that gfortran -O3 put automatic arrays (with unknown size at compile-time) on the heap. This explains why use_automatic() and use_alloc() did not make any difference above. So I made another comparison between different options below:

[O3]  gfortran -O3
[O5]  gfortran -O5
[O3s] gfortran -O3 -fstack-arrays
[Of]  gfortran -Ofast                   # this includes -fstack-arrays

Here, -fstack-arrays means that the compiler puts all local arrays with unknown size on the stack. Note that this flag is enabled by default with -Ofast. The obtained result is

nsizemax    use_automatic()               use_alloc()
            [Of]   [O3s]  [O5]  [O3]     [Of]  [O3s]  [O5]  [O3]
10          0.087  0.087  0.29  0.29     0.29  0.29   0.29  0.29
50          0.15   0.15   0.43  0.43     0.45  0.44   0.44  0.45
500         0.57   0.56   0.84  0.84     0.92  0.92   0.92  0.92
5000        3.9    3.9    4.1   4.1      4.2   4.2    4.2   4.2
100000      75.1   75.0   75.6  75.6     75.6  75.3   75.7  76.0

where the average of ten measurements are shown. This table demonstrates that if -fstack-arrays is included, the execution time for small n becomes shorter. This trend is consistent with the results obtained for ifort above.

It should be mentioned, however, that the above comparison probably corresponds to the "best-case" scenario that highlights the difference between them, so the timing difference can be much smaller in practice. For example, I have compared the timing for the above options by using some other program (involving both small and large arrays), and the results were not much affected by the stack options. Also the result should depend on machine architecture as well as compilers, of course. So your mileage may vary.

For gfortran you do not see any difference, because there is not any. Both are placed on the heap if n is not known beforehand. Use `-fstack-arrays` (which is included in `-Ofast` to test it). The default settings of `-O3` for gfortran and ifort differ. For ifort -O3 is the highest, but not for gfortran, gfortran has even -O5 and -Ofast. — Vladimir F Героям слава, Jul 17 '15 at 09:05
Thanks, this is very useful because there are some fast subroutines that we call many times and going from 0.16 to 0.45 may be significant. We have recently gone from ifort to gfortran, so I too will try to find out how those options work. — Nordico, Jul 20 '15 at 14:54
This question and our answers have seen interest because of a related new question. Maybe you can update the results of the tests here with some newer versions? — francescalus, Mar 24 '21 at 21:00

Difference between local allocatable and automatic arrays

2 Answers2

Linked

Related