Writing optimal code in FORTRAN using array expressions

Question

I am looking for a way to write fast code and be able to use builtin vector operations (for the sake of readability).

FORTRAN seems to be the good candidate. However, almost all resources I find on the web are about writing code without array expressions, and have only trivial examples of vector operations.

I feel strong need in some good resource which can cover caveats and give some insight into optimizations of code with vector expressions.

Example: currently I am not even able to predict the behavior of such code:

! a = [0], indices = [1, 1]
a(indices) = a(indices) + 1

After compiling I get a = [2], but it this correct? If I use openmp, will it behave like this?

Personally, I would be very happy to have something like following examples on numpy:

score 3 · Answer 1 · answered Oct 19 '15 at 18:04

3

Your code is not standard conforming:

Fortran 2008 6.5.3.3.2.3:

If a vector subscript has two or more elements with the same value, an array section with that vector subscript shall not appear in a variable definition context (16.6.7). NOTE 6.15

Therefore the result of your operation is not defined by the standard.

Other parts of your question appear to be too broad to treat them here. There are many books about scientific programming in Fortran 90 and later.

Also be aware that by vectorization most people in Fortran and C or C++ mean the usage of SIMD instructions simd and not the vectorized expressions from NumPy. These are just array expressions in Fortran.

answered Oct 19 '15 at 18:04

Vladimir F Героям слава

57,977
4
76
119

hi, this is not what I asked about, but anyway thanks for correct tips. I've edited question not to mess up fortran users. Language standard is indeed good starting point :) – Alleo Oct 19 '15 at 21:05
@Alleo Your question still says that you cannot predict the result of a certain code fragment. This answer says that, just with knowledge of the definition of Fortran, it isn't possible to predict what happens there. – francescalus Oct 19 '15 at 21:19
@francescalus exactly! (and this is my point, actually). I am looking for a deep tutorial where such array-specific things and caveats are explained. Maybe you know one? I feel that otherwise I will have to post lots of stupid questions like: `how I shall write numpy one-liner in fortran without 3 loops?` – Alleo Oct 19 '15 at 21:28
2

@Alleo This site is not for asking for deep tutorials, such questions are off-topic here. That is why I answered you your direct question. If this wasn't there I would vote for your question to be closed because it is too broad and asking for an off-site resource such as a library, book or tutorial is off-topic here. – Vladimir F Героям слава Oct 19 '15 at 21:35
For good tips for a learning resource go to the question which is recommended as Related in the column on the right or to http://fortranwiki.org/fortran/show/Tutorials – Vladimir F Героям слава Oct 19 '15 at 21:38
'question which is recommended as Related' already was there. Several links broken, others are not informative in the sense of arrays operations. I will check out your second link, though I have less and less hope. – Alleo Oct 19 '15 at 21:43

score -1 · Accepted Answer · edited May 23 '17 at 11:59

-1

I have scanned many sources (~20 books and dozens of web pages). Hard luck I missed something really important. The question I posted is indeed incorrect and comes from my initial high expectation about array operations in fortran.

The answer I would expect is: there are no tools to write short, readable code in fortran with automatic parallelization (to be more precise: there are, but those are proprietary libraries).

The list of intrinsic functions available in fortran is quite short (link), and consists only of functions easily mapped to SIMD ops.

There are lots of functions that one will be missing.

while this could be resolved by separate library with separate implementation for each platform, fortran doesn't provide such. There are commercial options (see this thread)

Brief examples of missing functions:

no built-in array sort or unique. The proposed way is to use this library, which provides single-threaded code (forget threads and CUDA)
cumulative sum / running sum. One trivially can implement it, but the resulting code will never work fine on threads/CUDA/Xeon Phi/whatever comes next.
bincount, numpy.ufunc.at, numpy.ufunc.reduceat (which is very useful in many applications)

In most cases fortran provides 2x speed up even with simple implementations, but the code written will always be one-threaded, while matlab/numpy functions can be reimplemented for GPU or other parallel platform without any effort from user side (which occasionally happened to MATLAB, also see gnumpy, theano and parakeet)

To conclude, this is bad news for me. Fortran developers really care about having fast programs today, not in the future. I also can't lock my code on proprietary software. And I'm still looking for appropriate tool. (Julia is current candidate)

See also:

STL analogue in fortran where ready-to-use algorithms are asked.
Numerical recipes: the art of parallel programming author implements basic MATLAB-like operations to have more expressive code
I also find useful these notes to see recommended ways of code optimizations (to see there is no place for vector operations)
numpy, fortran, blitz++: a case study
dicussion about implementing unique in fortran, where proprietary tools are recommended.

edited May 23 '17 at 11:59

Community

1
1

answered Oct 20 '15 at 14:09

Alleo

7,891
2
40
30

Common, there was not a single word in your question suggesting you care about accelerators. There are other means we use for parallelization in Fortran. OpenMP, OpenACC, MPI, Coarrays. They definitly are extensible to XeonPhi and OpenACC targets also CUDA. But this has nothing in common with the array expressions, they are used for short source code, not for performance. Do loops are often better for performance. (I did not down-vote BTW). – Vladimir F Героям слава Oct 20 '15 at 16:08
@VladimirF for me vectorization of code == automatic acceleration + expressiveness. Of course I already had look at options you say a long time ago - and they are not expressive. I need expressive tool, it is obvious from links provided in question. – Alleo Oct 20 '15 at 16:36
2

You are using strange terminology (really, vectorization is not used in this sense in Fortran and similar). I would argue Coarrays are very expressive and OpenMP is simple too. If you are looking for automatic tools only, try Chapel http://chapel.cray.com/ or you may be satisfied with Julia. However this is way out of scope of a question about Fortran. (BTW, IMNSHO, a triple nested DO loop can easily be more readable than a complicated array expression) – Vladimir F Героям слава Oct 20 '15 at 16:44
Huh, I am not sure I am ready to use language I never heard before, but 'language from CRAY' looks interesting, thank you – Alleo Oct 20 '15 at 17:11
Python/Numpy and Matlab provide many different methods for array reduction. This is not only for convenience, but also because (AFAIK) explicit loops are rather slow. So they need to "vectorize" as many expressions as possible, which sometimes leads to unnecessarily too condensed (thus less readable) codes (IMO, though). On the other hand, while Fortran does not provide so many reduction methods, but it is more than compensated by fast loops, plus equally fast colon-based array expressions (it is also parallelizable via OpenMP etc). So I think it is a matter of trade-off thing. – roygvib Oct 20 '15 at 17:23
If you need more "opinions" or inputs, other forums like [comp.lang.fortran](https://groups.google.com/forum/#!forum/comp.lang.fortran) might be good (because this thread may be closed anyway). – roygvib Oct 20 '15 at 18:32
1

If you are a heavy user of Numpy, Cython may be nice to get expressiveness + {C-performance for hot routines}. Another option may be along this way (ParallelAccelerator.jl from Intel Labs) https://groups.google.com/forum/#!topic/julia-users/oiElSE9cMls – roygvib Oct 21 '15 at 05:47
@roygvib much thanks for link to ParallelAccelerator, I will check it. – Alleo Oct 21 '15 at 11:44
It may also be interesting to check [Arrayfire](https://github.com/arrayfire/arrayfire) with https://groups.google.com/forum/#!topic/julia-users/ziLV294YeEk and [Yeppp!](http://www.yeppp.info/) with https://github.com/JuliaLang/Yeppp.jl But please note that SO does not welcome this kind of recommendation of other software/resources (including more docs about Fortran arrays that you asked in the Question), so I will stop here... For more info, please check Google Group etc directly (or possibly softwarerecs.stackexchange.com?, but the latter may not be effective for HPC...). Cheers – roygvib Oct 21 '15 at 14:45
1

@Alleo regarding automatic acceleration, to fully understand this you need to look at the object code produced by the compiler. Fortran can aggressively optimize and when you write a triple do loop you'll find the compiler (if you enable sse4/avx/etc) will unroll those loops and put everything into simd instructions to do e.g. 4 multiplies at a time, or whatever you are doing in the triple loop. This is what a Fortran or C programmer thinks of when "vectorization" is mentioned. If you further use OpenMP, MPI or coarrays, then this can happen (massively) parallel and do sums and reductions. – casey Oct 21 '15 at 15:08

Writing optimal code in FORTRAN using array expressions

2 Answers2