How to read data from external .txt file, store them in arrays, filter and write them in a new file in Fortran 90?

Question

I've read similar solved questions on this website but they do to help me! So, I'm sorry to make a similar question.

I've the following .txt file named "Asteroids_Numbered.txt" (the file has lots of rows, i.e. 607013, but I put a lot less for simplicity):

 Num   Name              Epoch      a          e        i         w        Node        M         H     G   Ref
------ ----------------- ----- ---------- ---------- --------- --------- --------- ----------- ----- ----- ----------
     1 Ceres             59600  2.7660431 0.07850100  10.58769  73.63704  80.26860 291.3755993  3.54  0.12 JPL 48
     2 Pallas            59600  2.7711069 0.22999297  34.92530 310.69725 172.91657 272.4799259  4.22  0.11 JPL 49
     3 Juno              59600  2.6687911 0.25688702  12.99186 247.94173 169.84780 261.2986327  5.28  0.32 JPL 123
     4 Vesta             59600  2.3612665 0.08823418   7.14172 151.09094 103.80392   7.0315225  3.40  0.32 JPL 36
     5 Astraea           59600  2.5751766 0.19009936   5.36762 358.74039 141.57036 160.9820880  6.99  0.15 JPL 125
     6 Hebe              59600  2.4256657 0.20306151  14.73873 239.50547 138.64097 347.4991368  5.65  0.24 JPL 100
     7 Iris              59600  2.3866161 0.22949924   5.51768 145.34355 259.52553  47.6423152  5.61  0.15 JPL 119
     8 Flora             59600  2.2017319 0.15606719   5.88872 285.55022 110.87251 136.2585358  6.54  0.28 JPL 127
     9 Metis             59600  2.3852921 0.12356142   5.57695   6.16423  68.89958 184.5626181  6.37  0.17 JPL 128
    10 Hygiea            59600  3.1418676 0.11162598   3.83093 312.49331 283.18419 328.8968591  5.55  0.15 JPL 105
    11 Parthenope        59600  2.4532814 0.09954681   4.63165 195.59824 125.52829 175.4211548  6.60  0.15 JPL 118
    12 Victoria          59600  2.3337809 0.22074254   8.37333  69.66955 235.36878  49.7506630  7.31  0.22 JPL 131
    13 Egeria            59600  2.5765835 0.08544364  16.53450  80.14708  43.20673  66.2442983  6.84  0.15 JPL 103
    14 Irene             59600  2.5859176 0.16587880   9.12082  97.71349  86.11601  42.0351479  6.53  0.15 JPL 96
    15 Eunomia           59600  2.6440754 0.18662534  11.75200  98.63169 292.92610 152.5002319  5.41  0.23 JPL 85
    16 Psyche            59600  2.9244847 0.13392662   3.09684 229.21980 150.03218 125.1275316  6.06  0.20 JPL 90
    17 Thetis            59600  2.4706187 0.13286003   5.59276 135.80674 125.54282 197.5734224  7.76  0.15 JPL 125
    18 Melpomene         59600  2.2957889 0.21790920  10.13249 228.11923 150.36173 190.3739342  6.53  0.25 JPL 116
    19 Fortuna           59600  2.4429040 0.15701789   1.57276 182.47214 211.04422  95.0887535  7.38  0.10 JPL 142
    20 Massalia          59600  2.4088126 0.14306413   0.70880 257.55922 205.97388  20.5136762  6.56  0.25 JPL 118
    21 Lutetia           59600  2.4351916 0.16354177   3.06364 250.15544  80.85386 243.3813245  7.52  0.11 JPL 118
    22 Kalliope          59600  2.9102024 0.09838131  13.70049 357.60063  65.99349  33.4836574  6.51  0.21 JPL 111

How can I create a program that reads this file, stores data in 1-D arrays (one for every type of data, so I want to get 12 arrays) and then filter them according some criteria, for example for a value of inclination (i) less then 2deg? At the end, how can I store the filtered data in a new file with the same formatting of original file?

Here my code (it contains only the reading part):

program Read_write_ephemerides_Main

    implicit none 
    
    !Declarations
    character*100 :: input_path,input_filename, output_path, output_filename
    double precision, dimension(:,:), allocatable :: Epoch_TDB, a_AU, e, i_deg, w_deg, Node_deg, M_deg, H_mag, G
    character*30, dimension(4) :: str_output
    character, dimension (:,:), allocatable :: Name, Ref
    integer :: i,iu, i_counter
    integer, dimension (:,:), allocatable :: Number
    logical :: bContinue

    ! Definition of constants, paths names and file names
    iu = 10
    input_path = 'D:\MSc_Aerospace_Eng\Thesis\Fortran_projects\Read_write_ephemerides\InputFiles\'
    input_filename = 'Asteroids_Numbered.txt'
    !output_path = 'D:\MSc_Aerospace_Eng\Thesis\Fortran_projects\Read_write_ephemerides\OutputFiles\'
    !output_filename = 'prova_ast_num.txt'
    
    ! Reading of Asteroids_numbered file
    open(unit = iu, file = trim(input_path) // trim(input_filename), status='old', & 
        access = 'sequential',form = 'formatted', action='read')
    read(iu,'(//)')     ! skip first 2 lines
    read(iu,'(i10,a25,f10.0,6(f12.8),2(f5.4),f5.4)')  Number, Name, Epoch_TDB, a_AU, e, i_deg, w_deg, Node_deg, M_deg, H_mag, G, Ref
    close(unit = iu,status='keep')
     
    ! Creation of output file
    !open(unit = iu, file = trim(output_path) // trim(output_filename1), status = 'unknown', action = 'write')
    !write(iu,'(i10,a25,f10.0,6(f12.8),2(f5.4),f5.4)')  Number, Name, Epoch_TDB, a_AU, e, i_deg, w_deg, Node_deg, M_deg, H_mag, G, Ref
    !close(unit = iu,status='keep')
    !
    
    stop
    
end program Read_write_ephemerides_Main

EDIT: Code updated USEFUL NOTE: I use Intel Fortran compiler within Microsoft Visual Studio 2022

This is a gratuitous comment, not very related to my previous comment ... *I want to get 12 arrays* No, you really don't want to do that, you want to define a *derived type* for asteroids and read your data into an array of asteroids. Later stages in your programming adventure will be immensely facilitated by getting this right at the start. Perhaps study https://stackoverflow.com/questions/26812870/read-data-from-file-into-array/26816279#26816279 a bit. — High Performance Mark, Jan 12 '22 at 17:36
@HighPerformanceMark you're right, I'm sorry. I corrected the question. — g_don, Jan 12 '22 at 17:51
Please consider breaking this question down into several parts. While they are all related, they are still multiple. In one reading, you have three independent questions: "what is a suitable datatype?"; "how do I do I/O with that data type?"; "how do I filter on that data type?". You have an answer for some of that, but it doesn't answer all parts and is needlessly complicated in isolation. — francescalus, Jan 12 '22 at 23:52

francescalus · Answer 1 · 2022-01-13T15:24:43.027

Let's get one thing out of the way before moving on to the next part: if this is simply a "filtering" task, treat it as a filtering task.

In Fortran 2018 this could be as simple as

  implicit none
  character(1234) line
  integer iostat, nchars

  do
    read (*,'(A)',iostat=iostat,size=nchars) line
    if (iostat.lt.0) exit
    if (KEEP_LINE) print *, line(:nchars)    ! Implement conditional
  end do
end program

(If your compiler isn't a Fortran 2018 compiler you will need to complicate this.) This program acts as a filter in the Unix-sense:

./program < input_file > output_file

For this question the filter would be something like "pass first two lines; pass later lines where the sixth column as numeric is less than 2". I'll leave the exact specification as an exercise, noting that we can do the job with

awk 'NR<3||$6<2' < input_file > output_file

Note that you can simply extract the sixth column without creating variables for each column - or you can note that it's the first column of line(52:).

That's the filtering out of the way. Let's look at how we can create a data structure and do something with it in the Fortran program.

As High Performance Mark commented, and veryreverie expanded on we can create a derived type for this "data table" (if all columns are the same data type we can possibly get away with just a rank-2 intrinsic type, although even in such cases a derived type helps):

  type asteroids_t
     integer :: num
     character(18) :: name
     integer :: epoch
     real :: a, e, i, w, node, m, h, g
     character(10) :: ref
  end type asteroids_t

(set the kind parameter of each component as desired, but probably double precision for the reals)

We have a format for the input and output:

  character(*), parameter :: FMT='(i6,a,i7,f10.7,f11.8,3f10.5,f12.7,2f6.2,a)'

(Note that we can't use list-directed formatting for the input, because the final column has a space in the character. Again, working around this is an exercise.)

Assuming we have an array appropriately sized (see general questions about reading files with unknown number of rows, or veryreverie's answer here for detail) we're good to go. For clarity here I'm going to use an explicit size.

  type(asteroids_t) asteroids(NUMBER_OF_ASTEROIDS)
  integer, allocatable :: pass(:)
  read FMT, asteroids
  ... ! Work, including setting pass for filter
  print FMT, asteroids(pass)

Putting that all together for a quick-and-dirty program:

  implicit none

  type asteroids_t
     integer :: num
     character(18) :: name
     integer :: epoch
     real(KIND(0d0)) :: a, e, i, w, node, m, h, g
     character(10) :: ref
  end type asteroids_t

  type(asteroids_t) :: asteroids(22)
  character(118) :: header(2)
  character(*), parameter :: FMT='(i6,a,i7,f10.7,f11.8,3f10.5,f12.7,2f6.2,a)'
  integer :: i
  
  read '(A)', header
  print '(A)', header

  read FMT, asteroids
  print FMT, asteroids(PACK([(i,i=1,SIZE(asteroids))], asteroids%i<2))
end program

The key point to note is that we can process our derived type with "normal" input/output: the item asteroids is expanded as array elements and each array element is then expanded as components. Because our derived type has no private, pointer or allocatable component we can use this simple form of processing.

As more advanced material, a note on the "magic numbers" in the example here. We already know how to remove the magic asteroid count (22) and the magic number and length of the header lines (2 and 118). But maybe we're worried about the lengths of those character components (18 and 10).

Our data structure is tightly coupled to the form of the input data file, but what if we have two datasets where the names differ in length? It will indeed be a pain to rewrite or duplicate our derived type to handle this. Let's solve that problem:

  type asteroids_t(len_name, len_ref)
     integer, len :: len_name=18, len_ref=10
     integer :: num
     character(len_name) :: name
     integer :: epoch
     real(KIND(0d0)) :: a, e, i, w, node, m, h, g
     character(len_ref) :: ref
  end type asteroids_t

  type(asteroids_t) asteroids_set_1(22)
  type(asteroids(25,8)) asteroids_set_2(22)

! There's no magic character length in FMT
  read FMT, asteroids_set_1
  read FMT, asteroids_set_2

The column widths can even be deferred to be resolved at run-time (not shown). You can read about these parameterized derived types in more detail elsewhere.

*if all columns are the same data type we wouldn't bother with a derived type, just a rank-2 intrinsic type*. Well, some of us would ... — High Performance Mark, Jan 13 '22 at 13:23
For all but the most simple tasks, I'd hope to be let in @HighPerformanceMark's club of derived type users even where intrinsic types would be usable. — francescalus, Jan 13 '22 at 13:28

veryreverie · Accepted Answer · 2022-01-13T15:54:32.603

To expand on @HighPerformanceMark's comments, the best thing to do is to define an Asteroid type which holds all of the information about an asteroid, and then to create an array of Asteroids.

The `Asteroid` type

The Asteroid type should initially just contain the data about an asteroid,

type :: Asteroid
  integer :: num
  character(:), allocatable :: name
  integer :: epoch
  real(dp) :: a
  real(dp) :: e
  real(dp) :: i
  real(dp) :: w
  real(dp) :: node
  real(dp) :: m
  real(dp) :: h
  real(dp) :: g
  character(:), allocatable :: ref_name
  integer :: ref_number
end type

where dp defines double precision.

This allows you to have an array of Asteroids, e.g.

type(Asteroid) :: asteroids(22)

asteroids(1) = Asteroid(1, "Ceres", ..., "JPL", 48)
...
asteroids(22) = Asteroid(22, "Kalliope", ..., "JPL", 111)

write(*,*) asteroids(1)%name ! Writes "Ceres".

Reading and Writing `Asteroid`s

You want to be able to read and write asteroids to and from file, and you can do this using user defined input/output. For this you need a subroutine to read an Asteroid, e.g.

subroutine read_Asteroid(this, unit, iotype, v_list, iostat, iomsg)
  class(Asteroid), intent(inout) :: this
  integer, intent(in) :: unit
  character(*), intent(in) :: iotype
  integer, intent(in) :: v_list(:)
  integer, intent(out) :: iostat
  character(*), intent(inout) :: iomsg
  
  character(100) :: name
  character(100) :: ref_name
  
  read(unit, *, iostat=iostat, iomsg=iomsg) &
     & this%num, &
     & name, &
     & this%epoch, &
     & this%a, &
     & this%e, &
     & this%i, &
     & this%w, &
     & this%node, &
     & this%m, &
     & this%h, &
     & this%g, &
     & ref_name, &
     & this%ref_number
  
  this%name = trim(name)
  this%ref_name = trim(ref_name)
end subroutine

and another to write an Asteroid, e.g.

subroutine write_Asteroid(this, unit, iotype, v_list, iostat, iomsg)
  class(Asteroid), intent(in) :: this
  integer, intent(in) :: unit
  character(*), intent(in) :: iotype
  integer, intent(in) :: v_list(:)
  integer, intent(out) :: iostat
  character(*), intent(inout) :: iomsg
  
  write(unit, *, iostat=iostat, iomsg=iomsg) &
     & this%num, &
     & this%name, &
     & this%epoch, &
     & this%a, &
     & this%e, &
     & this%i, &
     & this%w, &
     & this%node, &
     & this%m, &
     & this%h, &
     & this%g, &
     & this%ref_name, &
     & this%ref_number
end subroutine

You also need to add bindings to the Asteroid type so that it knows to use read_Asteroid and write_Asteroid for reading and writing. This looks like

type :: Asteroid
  integer :: num
  ...
  integer :: ref_number
contains
  ! `read` binding.
  generic :: read(formatted) => read_Asteroid
  procedure :: read_Asteroid
  
  ! `write` binding.
  generic :: write(formatted) => write_Asteroid
  procedure :: write_Asteroid
end type

N.B. because the Asteroid type has allocatable components (name and ref_name), which are not allocated by read statements, care must be taken when writing read_Asteroid. This method can be used to read allocatables; first reading to an over-large buffer, and then copying the data to the allocatable variable. (Thanks @francescalus for pointing out previous problems with my code here).

It's now possible to read and write asteroids directly, e.g.

character(1000) :: line
type(Asteroid) :: Ceres

line = "1 Ceres 59600 2.766 0.07850 10.58 73.63 80.26 291.3 3.54 0.12 JPL 48"
read(line, *) Ceres
write(*, *) Ceres

An example code

Putting this all together, here is an example code which reads a file full of asteroids and then writes those with i < 2:

module asteroid_module
  implicit none
  
  ! Define `dp`, which defines double precision.
  integer, parameter :: dp = selected_real_kind(15, 307)
  
  ! Define the `Asteroid` type.
  type :: Asteroid
    integer :: num
    character(:), allocatable :: name
    integer :: epoch
    real(dp) :: a
    real(dp) :: e
    real(dp) :: i
    real(dp) :: w
    real(dp) :: node
    real(dp) :: m
    real(dp) :: h
    real(dp) :: g
    character(:), allocatable :: ref_name
    integer :: ref_number
  contains
    ! `read` binding.
    generic :: read(formatted) => read_Asteroid
    procedure :: read_Asteroid
    
    ! `write` binding.
    generic :: write(formatted) => write_Asteroid
    procedure :: write_Asteroid
  end type
contains

! Define how to `read` an `Asteroid`.
subroutine read_Asteroid(this, unit, iotype, v_list, iostat, iomsg)
  class(Asteroid), intent(inout) :: this
  integer, intent(in) :: unit
  character(*), intent(in) :: iotype
  integer, intent(in) :: v_list(:)
  integer, intent(out) :: iostat
  character(*), intent(inout) :: iomsg
  
  character(100) :: name
  character(100) :: ref_name
  
  read(unit, *, iostat=iostat, iomsg=iomsg) &
     & this%num, &
     & name, &
     & this%epoch, &
     & this%a, &
     & this%e, &
     & this%i, &
     & this%w, &
     & this%node, &
     & this%m, &
     & this%h, &
     & this%g, &
     & ref_name, &
     & this%ref_number
  
  this%name = trim(name)
  this%ref_name = trim(ref_name)
end subroutine

! Define how to `write` an `Asteroid`.
subroutine write_Asteroid(this, unit, iotype, v_list, iostat, iomsg)
  class(Asteroid), intent(in) :: this
  integer, intent(in) :: unit
  character(*), intent(in) :: iotype
  integer, intent(in) :: v_list(:)
  integer, intent(out) :: iostat
  character(*), intent(inout) :: iomsg
  
  write(unit, *, iostat=iostat, iomsg=iomsg) &
     & this%num, &
     & this%name, &
     & this%epoch, &
     & this%a, &
     & this%e, &
     & this%i, &
     & this%w, &
     & this%node, &
     & this%m, &
     & this%h, &
     & this%g, &
     & this%ref_name, &
     & this%ref_number
end subroutine
end module

program example
  use asteroid_module
  implicit none
  
  character(1000) :: line
  integer :: iostat
  integer :: file_length
  type(Asteroid), allocatable :: asteroids(:)
  integer :: i
  
  ! Count the number of lines in the file.
  file_length = 0
  open(10, file="input.txt")
  do
    read(10, '(A)',iostat=iostat) line
    if (iostat/=0) then
      exit
    endif
    file_length = file_length + 1
  enddo
  close(10)
  
  ! Allocate the array to hold the asteroids.
  allocate(asteroids(file_length-2))
  
  ! Read the asteroids into the array.
  open(10, file="input.txt")
  read(10, '(A)') line
  read(10, '(A)') line
  do i=1,size(asteroids)
    read(10, '(A)') line
    read(line, *) asteroids(i)
  enddo
  close(10)
  
  ! Write the asteroids with `i` < 2 to a file.
  open(10, file="output.txt")
  do i=1,size(asteroids)
    if (asteroids(i)%i < 2.0_dp) then
      write(10,*) asteroids(i)
    endif
  enddo
  close(10)
end program

Wow @veryreverie! I'm very grateful to you for your line of reasoning and your code, I will study your answer and give my feedback. — g_don, Jan 12 '22 at 21:52
Although if you're going to make the components length 100 regardless of input, there's not much point making them allocatable? As explicit length, the whole need of defined input/output goes away. — francescalus, Jan 12 '22 at 22:55
Hi @francescalus, do you mean that I can replace the line `character(1000) :: line` with `character, dimension (:,:), allocatable :: line`? — g_don, Jan 12 '22 at 23:47
@Giuseppe, making `line` an allocatable rank-2 array is about the worst thing you can do! I'll happily provide my own answe (for what that is worth), but please consider my comment on your question before then. — francescalus, Jan 13 '22 at 00:02
I'm sorry @francescalu, but I'm new in fortran programming. However, I tried to run the code posted by @veryreverie but I get the error ` forrtl: severe (127): User Defined I/O procedure returned error 59, message: 'list-directed I/O syntax error, unit -5, file Internal List-Directed Read ` on the code line `read(line, *) asteroids(i)`. Some ideas about this? — g_don, Jan 13 '22 at 00:16
I'm afraid I have to -1 this (appreciated, high effort) answer: the complete example still doesn't handle allocation but more importantly, it's really not a good use of defined input/output. I'm a fan of modern Fortran, derived types and defined I/O but this is not a good use case even if bug-free. There's no compelling reason to make the components allocatable in the example (they're just set to a constant length defined by the module) so the bulk of the example code, in terms of lines and concept, is rather superfluous. (Also doesn't "filter them according some criteria".) — francescalus, Jan 13 '22 at 00:31
I very much appreciate your help as well as this code example. Tomorrow I will continue to study it and learn more about the programming concepts behind it. @francescalis if you have any ideas to improve/modify the code, they are very welcome. — g_don, Jan 13 '22 at 00:56
@francescalus thanks for the constructive criticism :) I think I've fixed the problems you pointed out. Am I still missing things? — veryreverie, Jan 13 '22 at 09:43
@Giuseppe sorry, as pointed out by francescalus my initial code was buggy. It works for me now (using `gfortran 7.5`). Let me know if you still have problems. — veryreverie, Jan 13 '22 at 09:48
Thanks for updating. I'm still missing why allocatable components are recommended for this case. Consider where someone wants to read just the second column. It would be unsound to say: create a derived type with deferred length character component, implement defined I/O for it, etc. Instead of `character(20) names(30); read *, names; print * names`. The overcomplexity of that first way is essentially the suggestion here. — francescalus, Jan 13 '22 at 10:10
@francescalus in my own code I always use a `type(String)` which has a single `character(:), allocatable` member, and all of the useful methods already defined (I believe this is broadly similar to the one [recently added to stdlib.fortran](https://stdlib.fortran-lang.org/type/string_type.html)). I find fixed-length `character(n)` variables quite unwieldy to work with; the need to `trim` adds complexity and forgetting it is an easy source of error. I guess it's mostly just personal preference. — veryreverie, Jan 13 '22 at 10:22
I agree that "advanced" string manipulation has it uses, but the (massive) additional complexity of allocatable components just isn't justified here. That includes the cognitive overhead of understanding (writing, reviewing, etc.) and keeping bug-free (non-trivial, as seen). For a Fortran naif it surely can't be wise to implement this (copy--paste an SO answer without understanding is a serious problem). Also: want an array of just the names, not `asteroids%name` (a variable) but `[character(124) :: (asteroids(i)%name,i=LBOUND(asteroids),UBOUND(asteroids))]` - not nice and not variable. Etc. — francescalus, Jan 13 '22 at 10:50
@francescalus perhaps you're right, although I still think the benefits of allocatable strings outweigh the downsides. I would be intrigued to see your own answer to this question, and then the community can weigh in on the two methods if they want to. — veryreverie, Jan 13 '22 at 11:02
Hi guys and one more time thank you very much for your support, I'm going to study and try your codes and I will give you my feedback. See you later... — g_don, Jan 13 '22 at 15:05
@veryreverie is there a particular reason for which you write twice the line read(10, '(A)') line in your program example? I also have the same error as yesterday (15 hours ago in the messages above) while debugging. — g_don, Jan 13 '22 at 15:39
The two `read(10, "(A)") line` are just to skip the first two lines of the input file. What compiler are you using to get that error message? — veryreverie, Jan 13 '22 at 15:53
Do you have any blank lines or other non-asteroid lines in your file? I haven't included any error handling in the example, so it will fall over if you try to read an `Asteroid` from a string which isn't formatted as your examples are. — veryreverie, Jan 13 '22 at 15:57
Hi @veryreverie, my fortran program has two files: main.f90 and module.f90. I use Intel fortran compiler — g_don, Jan 13 '22 at 20:29
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/241049/discussion-between-veryreverie-and-giuseppe). — veryreverie, Jan 13 '22 at 21:03

How to read data from external .txt file, store them in arrays, filter and write them in a new file in Fortran 90?

2 Answers2

The Asteroid type

Reading and Writing Asteroids

An example code

The `Asteroid` type

Reading and Writing `Asteroid`s