1

Suppose I have the following txt file:

0.0163934
6
7.52438e+09
2147483648
6.3002e-06 6.31527e-08 0 0 6 0 0 4.68498e-06 0.00638412 12.6688
6.33438e-06 0 5.99588e-09 0 0 0 0 4.70195e-06 0 12.876
6.36874e-06 0 6.09398e-09 0 0 0 0 4.71894e-06 0 13.0867
6.40329e-06 0 6.19369e-09 0 0 0 0 4.73593e-06 0 13.3009
6.43802e-06 0 6.29503e-09 0 0 0 0 4.75294e-06 0 13.5185
6.47295e-06 0 6.39803e-09 0 0 0 0 4.76996e-06 0 13.7397
0.0163934
3
7.52438e+09
2147483648
6.3002e-06 0 5.89935e-09 0 0 0 0 4.68498e-06 0 12.6688
6.33438e-06 0 5.99588e-09 0 0 0 0 4.70195e-06 0 12.876
6.36874e-06 0 6.09398e-09 0 0 0 0 4.71894e-06 0 13.0867

I want to read each of the first lines as floats or integers and then depending on the second line I want to read the rest of lines as a list of lists or array.

In IDL language I just have to do:

openr, 1, fname
readf, 1, Time
readf, 1, Bins
readf, 1, dummy
readf, 1, dummyLong
da1= fltarr(10, Bins)
readf, 1, da1

So that the entire block of numbers is stored in the integer da1 which is size: 10*Bins. (rows and columns are the opposite as in python)

And then I can read the following lines in the same way.

In python I am doing:

Time=float(filen.readline())
Bins=int(filen.readline())
dummy=float(filen.readline())
dummyLong=long(filen.readline())

lines=[filen.readline() for i in range(Bins)]

arra=[[float(x) for x in lines[i].split()] for i in range(len(lines))]

So I need two lines of code and complicated iterations that are not understandable to a beginner.

Is there a way to do it like in IDL, in a single statement and pythonic?

Thanks!

Santiago
  • 226
  • 2
  • 14
  • Oh and I want to do it in a memory friendly way, since I actually have thousands of lines. That's why I don't use file.readlines() – Santiago Feb 01 '13 at 16:34
  • 1
    `filen.readlines()` isn't memory friendly, but `for line in filen` is. – cha0site Feb 01 '13 at 16:51

4 Answers4

1

One-liners are not necessarily better than two-liners.

But you can do it:

arra = [[float(x) for x in filen.readline().split()] for _ in range(Bins)]

I like it better in two lines:

lines = (filen.readline() for _ in range(Bins))
arra = [[float(x) for x in line.split()] for line in lines]
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
1
Time=float(fname.readline())
Bins=int(fname.readline())
dummy=float(fname.readline())
dummyLong=long(fname.readline())
arra = [ [ float(num) for num in line.split() ] for line in filen ]

That's just slightly more Pythonic, but it does not stop reading after the required number of lines, it simply reads them all. You could use islice from itertools in order to stop the iteration, or you could simply truncate the list afterwards.

Here's an example, and since I'm already using islice I took the liberty of getting all fancy with functional programming...

from itertools import islice

CONVERTORS = (float, int, float, long, )
with open(...) as filen:
    Time, Bins, dummy, dummyLong = [ func(value) for func, value in zip(CONVERTORS, islice(filen, 4)) ]
    arra = [ map(float, line.split()) for line in islice(filen, Bins) ]
cha0site
  • 10,517
  • 3
  • 33
  • 51
1

Here's a more object-oriented way to do it using a simply coded FSM (Finite State Machine) to control the process of reading in complete data records. It's more verbose that the other answers currently posted, but it's a fairly flexible and extensible way to handle such tasks and do so with error-checking.

class Record(object):
    def __init__(self, time=None, bins=None, fltarr=None):
        self.time = time
        self.bins = bins
        self.fltarr = fltarr

    def read(self, file):
        """ Read complete record from file into self and return True,
            otherwise return False if EOF encountered """
        START, STOP, EOF = 0, -1, -99

        state = START
        while state not in (EOF, STOP):
            line = file.readline()
            if not line: state = EOF; break
            # process line depending on read state
            if state == 0:
                self.time = float(line)
                state = 1
            elif state == 1:
                self.bins = int(line)
                state = 2
            elif state in (2, 3):
                # ignore line
                state += 1
            elif state == 4:
                self.fltarr = []
                last_bin = self.bins-1
                for bin in xrange(self.bins):
                    self.fltarr.append([float(x) for x in line.split()])
                    if bin == last_bin: break
                    line = file.readline()
                    if not line: state = EOF; break
                if state != EOF:
                    state = STOP

        return state == STOP

    def __str__(self):
        result = 'Record(time={}, bins={}, fltarr=[\n'.format(self.time, self.bins)
        for floats in self.fltarr:
            result += '  {}\n'.format(floats)
        return result + '])'

fname = 'sample_data.txt'
with open(fname, 'r') as input:
    data = []
    while True:
        record = Record()
        if not record.read(input):
            break
        else:
            data.append(record)

for record in data:
    print record

Output:

Record(time=0.0163934, bins=6, fltarr=[
  [6.3002e-06, 6.31527e-08, 0.0, 0.0, 6.0, 0.0, 0.0, 4.68498e-06, 0.00638412, 12.6688]
  [6.33438e-06, 0.0, 5.99588e-09, 0.0, 0.0, 0.0, 0.0, 4.70195e-06, 0.0, 12.876]
  [6.36874e-06, 0.0, 6.09398e-09, 0.0, 0.0, 0.0, 0.0, 4.71894e-06, 0.0, 13.0867]
  [6.40329e-06, 0.0, 6.19369e-09, 0.0, 0.0, 0.0, 0.0, 4.73593e-06, 0.0, 13.3009]
  [6.43802e-06, 0.0, 6.29503e-09, 0.0, 0.0, 0.0, 0.0, 4.75294e-06, 0.0, 13.5185]
  [6.47295e-06, 0.0, 6.39803e-09, 0.0, 0.0, 0.0, 0.0, 4.76996e-06, 0.0, 13.7397]
])
Record(time=0.0163934, bins=3, fltarr=[
  [6.3002e-06, 0.0, 5.89935e-09, 0.0, 0.0, 0.0, 0.0, 4.68498e-06, 0.0, 12.6688]
  [6.33438e-06, 0.0, 5.99588e-09, 0.0, 0.0, 0.0, 0.0, 4.70195e-06, 0.0, 12.876]
  [6.36874e-06, 0.0, 6.09398e-09, 0.0, 0.0, 0.0, 0.0, 4.71894e-06, 0.0, 13.0867]
])
martineau
  • 119,623
  • 25
  • 170
  • 301
  • It is a really interesting solution. Indeed can be very useful for doing some checks and handling my data, i will keep it in mind, although in this case I need the code to be as compact and simple as possible, because other people have to read it too. – Santiago Feb 02 '13 at 01:07
0

You can also use numpy loadtxt like

from numpy import loadtxt
data = loadtxt("input.txt", unpack=False)

then convert the data type as you wish like

Alternatively, readlines can also be used:

from numpy import fromstring
fin = open("filename.dat")
data = fin.readlines()
Bins = -3
for record range(no_of_records):
    i = record + 3 + Bins
    Time = float(data[i])
    Bins = int(data[i+1])
    dummy, dummylong = (float(data[i+2]),float(data[i+3]))
    Bins = [fromstring(data(i+4+j), dtype=float, sep=" ") for j in range(Bins)]
Thiru
  • 3,293
  • 7
  • 35
  • 52