Translating fortran to python: equivalence command used to generate a matrix from text file

Question

I am trying to replicate an old fortran code so I can use it in python. The hardest part is importing the data which is stored in a rather convoluted shape.

I have a text file with 14500 data points stored in 5 columns separated by commas:

 9.832e-06, 2.121e-16, 3.862e-21, 2.677e-23, 1.099e-22,
 5.314e-23, 1.366e-23, 6.504e-23, 3.936e-23, 1.175e-22,
 1.033e-23, 5.262e-23, 3.457e-23, 9.673e-23, 1.903e-22,
 3.153e-10, 2.572e-21, 5.350e-23, 4.522e-22, 1.468e-22,
 2.195e-23, 2.493e-22, 1.075e-22, 3.525e-22, 1.541e-23,
 1.935e-22, 9.220e-23, ...,       ...,       ...,

The fortran code declares two variables to store these data: pg and ping. The first one is a 4D matrix and the second is a 1D array with the data points array length (14500)

  parameter(NSIZ=29)
  parameter(NTg=5)
  parameter(NDg=5)
  parameter(NTAUg=20)
  real pg(NSIZ,NDg,NTg,NTAUg)
  real ping(NSIZ*NDg*NTg*NTAUg)

So far so good, but now I have an equivalence command:

  equivalence(pg,ping)

which as far as I understand it points the pg matrix to the ping array. The final part is a loop which reads the lines from the data file and allocates them to the ping array:

  NCOLs=5
  NROWS=NTg*NDg*NTAUg*NSIZ / NCOLs

  irow=1
  do i=1,NROWS
    read(nat,*) (ping((irow-1)*NCOLs+k),k=1,NCOLs)
    print *, ping(irow)
    irow=irow+1
  enddo

I would like to replicate this code in python but I do not understand how the read command is assigning the data to the 4D matrix. Could anyone offer any advice?

maybe just use the `csv` module? Nevermind, I think you are asking about more then parsing the file. — Tadhg McDonald-Jensen, Mar 16 '16 at 18:27
Thank you for the comment. In python there are more... natural ways... to read the file I just want to understand what is the fortran code doing to make sure the matrix data allocation is the same — Delosari, Mar 16 '16 at 18:33
yeah, I don't know fortran so it was a bit of a shot in the dark, wish you luck with your question though! — Tadhg McDonald-Jensen, Mar 16 '16 at 18:37

mgilson · Accepted Answer · 2016-03-17T19:12:19.957

First, Fortran equivalence and memory layout. We need to look at memory layout first. I'll describe the 2D case for simplicity. The generalization shouldn't be too hard to figure out for arbitrary dimensionality.

A fortran array is always a contiguous block of memory¹. The furthest left index varies the most rapidly (known as Column-major order). So:

a(i, j)
a(i+1, j)

^{¹Nearly always. Array slices and F90+ pointers might make that statement untrue in some circumstances. Even then, those "arrays" are backed by a contiguous block of memory allocated to the program...}

are adjacent in memory. Now, assume we have an array declared with length a(N, M). From a memory perspective, this is no different than an array a(N*M) where the mapping of indices is:

memory_index = i*(N-1) + j
a(memory_index)

(the N-1 is because fortran defaults to being 1-based in the indexing).

Ultimately, when you have a 2D array, the compiler translates your statements a(i, j) to a(i*(N-1) + j). It might also do some bounds checking ^{(e.g. to make sure that i is between 1 and N, but you usually have to turn that on with a compiler flag because it impacts performance slightly and the Fortran crowd really doesn't like it when their programs slow down ;-)}.

Ok, where are we? Pretty much everything I've said is that to the compiler, an N dimensional array is really just a 1 dimensional array with some index remapping.

Now we can start to deal with equivalence. The analogous construct in C would be pointers -- sort of. Your Equivalence statement says that the first element in ping is the same memory location as the first element in pg. This allows the user to read the data into ping which is a flat array, but have it populate the n-dimensional array (since ultimately, they share the same memory locations). It's worth noting that modern Fortran has pointers. They're not exactly like C pointers, but I think that most Fortran enthusiasts would advise you to use them in new code instead of equivalence.

Now, how would I read this data in python? I'd probably do something really simple:

def yield_values(filename):
    with open(filename) as f_input:
        for line in f_input:
            for val in line.split(','):
                yield float(val)

Then you can pack it into a numpy array:

import numpy as np
arr = np.array(list(yield_values('data.dat')))

Finally, you can reshape the array:

# Account for python's Row-major ordering.
arr = arr.reshape((NTAUg, NTg, NDg, NSIZ))

If you prefer fortran's column major ordering:

arr = arr.reshape((NSIZ, NDg, NTg, NTAUg), order='F')

"A fortran array is always a contiguous block of memory" is not true. It is _widely_ true, though. — francescalus, Mar 16 '16 at 19:11
@francescalus -- are referencing array slices and f90 pointers to portions of an array? Or are there other cases I'm not thinking about at the moment... — mgilson, Mar 16 '16 at 19:20
An assumed-shape array an obvious difficulty. As with pointers, not worth the detail in terms of the answer of course. Just my pedantry kicking in, alas. — francescalus, Mar 16 '16 at 19:24
@mgilson thank you very much for your reply: I am still not very clear on how/why the physical data was stored in the text file and your explanation helps a lot. Just one technical doubt: Do you know why in the read command we are adding "k=1, NCOLS)" at the end? Since these values are constant. Why are they needed with the read statement? — Delosari, Mar 16 '16 at 21:04
@Delosari -- Each row holds `NCOLS` values. With `NCOLS .eq. 5`, the first element in the 2nd row should be at index 6. The first element in the third row should be at index 11, etc. The second elements are at indices `[2, 7, 12, ...]`. Generalizing, the `n`th row holds element starting at `(n-1) * NCOLS + 1` — mgilson, Mar 16 '16 at 21:07
@mgilson Sorry again for bothering you I have a small issue with your reply: I get the following error in the command arr.reshape: **ValueError: total size of new array must be unchanged**. I think it is because when you "account for python 0 base index" you generate a smaller array (28*4*4*9) instead of (29*5*5*10)... although maybe I am mistaken... I still have to compare both programs to confirm I get the same array but what do you think? Should/can I edit your answer? — Delosari, Mar 17 '16 at 18:59
@Delosari -- Nope, it's me who should be sorry. You're definitely correct. I've edited. In the future, you should feel free to edit this (or any of the rest of my answers) to remove minor bugs/inaccuracies like that. In this case, the community might have voted down your edit because it would have the chance of making my code inaccurate (but you never know and I would have been notified of your edit so I could apply it retroactively if I wanted). Anyway, it should be fixed now. Thanks for pointing it out! — mgilson, Mar 17 '16 at 19:16

score 1 · Answer 2 · answered Mar 16 '16 at 18:52

1

Because of the equivalence statement, any data stored in ping is also stored in pg because they point to the same location in memory. So when the read command reads in a value and stores it in ping, it is available by reading pg using the 4D notation. It's a trick to "flatten" the array for the purpose of reading data into it.

In Python, you could do something similar but it really depends on how you plan to use the array.

answered Mar 16 '16 at 18:52

Brent Washburne

12,904
4
60
82

Thank you very much @Brent Wasburne for your reply. Actually, I want to use the array as in the original code. However, I do not know why/how this matrix is shaped like this from the published paper. This is why I am bothering you, trying to get some insight from the code itself :) – Delosari Mar 16 '16 at 21:07

Translating fortran to python: equivalence command used to generate a matrix from text file

2 Answers2