2

I am trying to replicate an old fortran code so I can use it in python. The hardest part is importing the data which is stored in a rather convoluted shape.

I have a text file with 14500 data points stored in 5 columns separated by commas:

 9.832e-06, 2.121e-16, 3.862e-21, 2.677e-23, 1.099e-22,
 5.314e-23, 1.366e-23, 6.504e-23, 3.936e-23, 1.175e-22,
 1.033e-23, 5.262e-23, 3.457e-23, 9.673e-23, 1.903e-22,
 3.153e-10, 2.572e-21, 5.350e-23, 4.522e-22, 1.468e-22,
 2.195e-23, 2.493e-22, 1.075e-22, 3.525e-22, 1.541e-23,
 1.935e-22, 9.220e-23, ...,       ...,       ...,

The fortran code declares two variables to store these data: pg and ping. The first one is a 4D matrix and the second is a 1D array with the data points array length (14500)

  parameter(NSIZ=29)
  parameter(NTg=5)
  parameter(NDg=5)
  parameter(NTAUg=20)
  real pg(NSIZ,NDg,NTg,NTAUg)
  real ping(NSIZ*NDg*NTg*NTAUg)

So far so good, but now I have an equivalence command:

  equivalence(pg,ping)

which as far as I understand it points the pg matrix to the ping array. The final part is a loop which reads the lines from the data file and allocates them to the ping array:

  NCOLs=5
  NROWS=NTg*NDg*NTAUg*NSIZ / NCOLs

  irow=1
  do i=1,NROWS
    read(nat,*) (ping((irow-1)*NCOLs+k),k=1,NCOLs)
    print *, ping(irow)
    irow=irow+1
  enddo

I would like to replicate this code in python but I do not understand how the read command is assigning the data to the 4D matrix. Could anyone offer any advice?

Delosari
  • 677
  • 2
  • 17
  • 29

2 Answers2

4

First, Fortran equivalence and memory layout. We need to look at memory layout first. I'll describe the 2D case for simplicity. The generalization shouldn't be too hard to figure out for arbitrary dimensionality.

A fortran array is always a contiguous block of memory1. The furthest left index varies the most rapidly (known as Column-major order). So:

a(i, j)
a(i+1, j)

1Nearly always. Array slices and F90+ pointers might make that statement untrue in some circumstances. Even then, those "arrays" are backed by a contiguous block of memory allocated to the program...

are adjacent in memory. Now, assume we have an array declared with length a(N, M). From a memory perspective, this is no different than an array a(N*M) where the mapping of indices is:

memory_index = i*(N-1) + j
a(memory_index)

(the N-1 is because fortran defaults to being 1-based in the indexing).

Ultimately, when you have a 2D array, the compiler translates your statements a(i, j) to a(i*(N-1) + j). It might also do some bounds checking (e.g. to make sure that i is between 1 and N, but you usually have to turn that on with a compiler flag because it impacts performance slightly and the Fortran crowd really doesn't like it when their programs slow down ;-).

Ok, where are we? Pretty much everything I've said is that to the compiler, an N dimensional array is really just a 1 dimensional array with some index remapping.

Now we can start to deal with equivalence. The analogous construct in C would be pointers -- sort of. Your Equivalence statement says that the first element in ping is the same memory location as the first element in pg. This allows the user to read the data into ping which is a flat array, but have it populate the n-dimensional array (since ultimately, they share the same memory locations). It's worth noting that modern Fortran has pointers. They're not exactly like C pointers, but I think that most Fortran enthusiasts would advise you to use them in new code instead of equivalence.


Now, how would I read this data in python? I'd probably do something really simple:

def yield_values(filename):
    with open(filename) as f_input:
        for line in f_input:
            for val in line.split(','):
                yield float(val)

Then you can pack it into a numpy array:

import numpy as np
arr = np.array(list(yield_values('data.dat')))

Finally, you can reshape the array:

# Account for python's Row-major ordering.
arr = arr.reshape((NTAUg, NTg, NDg, NSIZ))

If you prefer fortran's column major ordering:

arr = arr.reshape((NSIZ, NDg, NTg, NTAUg), order='F')
mgilson
  • 300,191
  • 65
  • 633
  • 696
  • "A fortran array is always a contiguous block of memory" is not true. It is _widely_ true, though. – francescalus Mar 16 '16 at 19:11
  • @francescalus -- are referencing array slices and f90 pointers to portions of an array? Or are there other cases I'm not thinking about at the moment... – mgilson Mar 16 '16 at 19:20
  • An assumed-shape array an obvious difficulty. As with pointers, not worth the detail in terms of the answer of course. Just my pedantry kicking in, alas. – francescalus Mar 16 '16 at 19:24
  • @mgilson thank you very much for your reply: I am still not very clear on how/why the physical data was stored in the text file and your explanation helps a lot. Just one technical doubt: Do you know why in the read command we are adding "k=1, NCOLS)" at the end? Since these values are constant. Why are they needed with the read statement? – Delosari Mar 16 '16 at 21:04
  • @Delosari -- Each row holds `NCOLS` values. With `NCOLS .eq. 5`, the first element in the 2nd row should be at index 6. The first element in the third row should be at index 11, etc. The second elements are at indices `[2, 7, 12, ...]`. Generalizing, the `n`th row holds element starting at `(n-1) * NCOLS + 1` – mgilson Mar 16 '16 at 21:07
  • @mgilson Sorry again for bothering you I have a small issue with your reply: I get the following error in the command arr.reshape: **ValueError: total size of new array must be unchanged**. I think it is because when you "account for python 0 base index" you generate a smaller array (28*4*4*9) instead of (29*5*5*10)... although maybe I am mistaken... I still have to compare both programs to confirm I get the same array but what do you think? Should/can I edit your answer? – Delosari Mar 17 '16 at 18:59
  • @Delosari -- Nope, it's me who should be sorry. You're definitely correct. I've edited. In the future, you should feel free to edit this (or any of the rest of my answers) to remove minor bugs/inaccuracies like that. In this case, the community might have voted down your edit because it would have the chance of making my code inaccurate (but you never know and I would have been notified of your edit so I could apply it retroactively if I wanted). Anyway, it should be fixed now. Thanks for pointing it out! – mgilson Mar 17 '16 at 19:16
1

Because of the equivalence statement, any data stored in ping is also stored in pg because they point to the same location in memory. So when the read command reads in a value and stores it in ping, it is available by reading pg using the 4D notation. It's a trick to "flatten" the array for the purpose of reading data into it.

In Python, you could do something similar but it really depends on how you plan to use the array.

Brent Washburne
  • 12,904
  • 4
  • 60
  • 82
  • Thank you very much @Brent Wasburne for your reply. Actually, I want to use the array as in the original code. However, I do not know why/how this matrix is shaped like this from the published paper. This is why I am bothering you, trying to get some insight from the code itself :) – Delosari Mar 16 '16 at 21:07