4

When I program I often use external software to do the heavy computations, but then analysis the results in Python. These external software is often Fortran, C or C++, which works by giving them input file(s). This can either be a small file telling which mode to perform certain calculations, or a large data file it has to process. These files often use a certain format (so and so many spaces between data columns). An e.g. is given below for a data file I currently use.

This is a header. The first line is always a header...
  7352.103      26.0      2.61    -8.397                         11.2
  7353.510      26.0      4.73    -1.570                          3.5
  7356.643      26.0      5.75    -2.964                          9.0
  7356.648      26.0      5.35    -3.187                          9.0
  7364.034      26.0      5.67    -5.508                          1.7
  7382.523      26.0      5.61    -3.935                          1.9

My question is if there exist a Python library to create such input files, from reading a template (given by a coworker or from documentation of the external software)?

Usually I have all the columns in a NumPy format and want to give it to a function that creates an input file, using the template as an example. I'm not looking for a brute force method, which can get ugly very quickly.

I am not sure what to search for here, and any help is appreciated.

  • 1
    Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it. – jonrsharpe Feb 18 '14 at 22:15
  • Did you already have look at numpy's ``savetxt`` function? – Dietrich Feb 18 '14 at 22:28
  • @jonrsharpe: So far I have written programs that add lines of the above data file and save it as an text file. The problem is, that I see some software that need data files (or whatever kind of file) at a certain format, and it is tedious work to write a small script every time in Python, so the file can be read by the external software. I understand your worries, but any help to find something useful would be nice. @Dietrich: Yes, but I don't think `savetxt` can be used since it doesn't conserve the spaces. – Daniel Thaagaard Andreasen Feb 18 '14 at 22:32

3 Answers3

5

I can basically replicate your sample with savetxt. Its fmt variable gives me the same sort of formatting control that FORTRAN code uses for reading and writing files. It preserves spaces in the same way that FORTRAN and C print does.

import numpy as np

example = """
This is a header. The first line is always a header...
  7352.103      26.0      2.61    -8.397                         11.2
...
"""

lines = example.split('\n')[1:]
header = lines[0]
data = []
for line in lines[1:]:
  if len(line):
    data.append([float(x) for x in line.split()])
data = np.array(data)

fmt = '%10.3f %9.1f %9.2f %9.3f %20.1f'  # similar to a FORTRAN format statment
filename = 'stack21865757.txt'

with open(filename,'w') as f:
  np.savetxt(f, data, fmt, header=header)

with open(filename) as f:
  print f.read()

producing:

# This is a header. The first line is always a header...
  7352.103      26.0      2.61    -8.397                 11.2
  7353.510      26.0      4.73    -1.570                  3.5
...

EDIT

Here's a crude script that converts an example line into a format:

import re
tmplt = '  7352.103      26.0      2.61    -8.397                         11.2'
def fmt_from_template(tmplt):
    pat = r'( *-?\d+\.(\d+))' # one number with its decimal
    fmt = []
    while tmplt:
        match = re.search(pat,tmplt)
        if match:
            x = len(match.group(1)) # length of the whole number
            d = len(match.group(2)) # length of decimals
            fmt += ['%%%d.%df'%(x,d)]
            tmplt = tmplt[x:]
    fmt = ''.join(fmt)
    return fmt
print fmt_from_template(tmplt)
# %10.3f%10.1f%10.2f%10.3f%29.1f
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Maybe I will try to play with this, but for me again, the "problem" (although better than my attempts with numpy) is I have to know the format. It would be nice, if the library I'm looking for would know this by looking at a template. – Daniel Thaagaard Andreasen Feb 22 '14 at 15:10
  • 2
    What do you mean by 'template'? In one way or other you have to know the fields, kind of number (int, float, etc), decimal places (if that matters), and spacing (if that matters), delimiting characters (,). – hpaulj Feb 22 '14 at 17:07
  • 1
    +1: Using the Python format string as a "template" seems like a good general solution. – tom10 Feb 22 '14 at 20:09
  • By template I mean an example. This however seems like a good idea, an I will probably adopt it for a program I'm trying to make. – Daniel Thaagaard Andreasen Feb 23 '14 at 12:19
  • I find this to be a very good solution, and it does indeed work with my template. Now I'm expanding and trying other templates. I have a tricky one with not only numbers, but also a column with letters: `tmplt = "5253.534 26.0 26479 45509 -1.523 1.000 10 109.8 NIST"` In this case your script stops while trying to match at the end of the string provided here – Daniel Thaagaard Andreasen Mar 14 '14 at 15:58
  • That `re` pattern matching could be expanded to look for text, integers, and even exponentials. At some point it might be better to switch to some other toxenizing or parsing tool. – hpaulj Mar 14 '14 at 20:43
2

adapating hpaulj andwer to magically extract the fmt of savetxt

from __future__ import print_function
import numpy as np
import re
example = """
This is a header. The first line is always a header...
  7352.103      26.0      2.61    -8.397                         11.2
  7353.510      26.0      4.73    -1.570                          3.5
  7356.643      26.0      5.75    -2.964                          9.0
  7356.648      26.0      5.35    -3.187                          9.0
  7364.034      26.0      5.67    -5.508                          1.7
  7382.523      26.0      5.61    -3.935                          1.9
"""
def extract_format(line):
  def iter():
    for match in re.finditer(r"\s+-?\d+\.(\d+)",line):
      yield "%{}.{}f".format(len(match.group(0)),len(match.group(1)))
  return "".join(iter())

lines = example.split('\n')[1:]
header = lines[0]
data = []
for line in lines[1:]:
  if len(line):
    data.append([float(x) for x in line.split()])
data = np.array(data)

fmt = extract_format(lines[1])  # similar to a FORTRAN format statment

filename = 'stack21865757.txt'

with open(filename,'w') as f:
  print(header,file=f)
  np.savetxt(f, data, fmt)

with open(filename) as f:
  print (f.read())

producing

This is a header. The first line is always a header...
  7352.103      26.0      2.61    -8.397                         11.2
  7353.510      26.0      4.73    -1.570                          3.5
  7356.643      26.0      5.75    -2.964                          9.0
  7356.648      26.0      5.35    -3.187                          9.0
  7364.034      26.0      5.67    -5.508                          1.7
  7382.523      26.0      5.61    -3.935                          1.9
Community
  • 1
  • 1
Xavier Combelle
  • 10,968
  • 5
  • 28
  • 52
1

If your header is always the same, then you could look into pandas. This would allow you to move columns around really easily just by knowing the name of the column from the header. Even if the header isn't always the same, if you could get the headers from the template, then it could still rearrange it.

If I have misunderstood the question, then I am sorry, but more concrete data or a longer example might be nice for more help.

Wesley Bowman
  • 1,366
  • 16
  • 35
  • I will try to look into pandas, thanks for the advice. The example is a concrete data (a snippet of some data I'm using). The external software using here is a fortran program. Do you know if pandas preserves the spaces between the columns? – Daniel Thaagaard Andreasen Feb 21 '14 at 17:00
  • Pandas can do whatever you tell it, it is quite powerful and verbose. And if you use something like f2py to turn Fortran subroutines into python modules, then whatever returns from the Fortran subroutine can just be immediately turned into a panda DataFrame or whatnot. I would definitely recommend using f2py as well if Fortran is what you got running. – Wesley Bowman Feb 24 '14 at 14:35
  • I am not really sure pandas is the ideal solution here. Can you please provide an example code? – Daniel Feb 27 '14 at 00:38