Parsing numeric data from text file using Python

Question

I am attempting build a database from a numeric model output text file. The text file has four (4) rows of title block data followed by many rows (41,149) of data blocks which are each seperated by the word 'INTERNAL' followed by some numeric data as shown below:

Line1: Title block
Line2: Title block
Line3: Title block
Line4: Title block
Line5: INTERNAL       1.0 (10E16.9)  -1
Line6: data data    data    data 
Line7: data data    data    data 
Line8 to Line25: data   data    data    data 
Line26: data    data    data    data 
Line27: INTERNAL       1.0 (10E16.9)  -1
Line28: data    data    data    data 
..etc all the way down to line 41,149

The data blocks are not of consistent size (i.e., some have more rows of data than others). Thanks to a lot of help from this site, I have been able to take the 41,149 rows of data and organize each data block into seperate lists that I can parse through and build the database from. My problem is that this operation takes a very long time run. I was hoping someone could look at the code I have below and give me suggestions on how I might be able to run it more efficiently. I can attach the model output file if needed. Thanks!

inFile = 'CONFINED_AQIFER.DIS'

strings = ['INTERNAL']
rowList = []
#Create a list of each row number where a data block begins
with open(inFile) as myFile:
    for num, line in enumerate(myFile, 1):
        if any(s in line for s in strings):
            rowList.append(num)
#Function to get line data from row number
def getlineno(filename, lineno):
    if lineno < 1:
        raise TypeError("First line is line 1")
    f = open(filename)
    lines_read = 0
    while 1:
        lines = f.readlines(100000)
        if not lines:
            return None
        if lines_read + len(lines) >= lineno:
            return lines[lineno-lines_read-1]
        lines_read += len(lines)
#Organize each data block into a unique list and append to a final list (fList)
fList = []
for row in range(len(rowList[1:])):
    combinedList = []
    i = rowList[row]
    data = []
    while i < rowList[row+1]:
        line = getlineno(inFile, i)
        data.append(line.split())
        i+=1
    for d in range(len(data))[1:]:
        for x in data[d]:
            combinedList.append(x)
    fList.append(combinedList)

while 44k lines are a lot, are you really sure this operation is input limited? I would imagine, that writing 44k lines to the terminal would take it's toll as well. — niklasfi, Apr 28 '14 at 19:15
Done! Sorry, this is only my second time to post a question to this site. — user2793356, Apr 28 '14 at 20:59

score 0 · Answer 1 · answered Apr 28 '14 at 21:04

Some comments:

In Python2, xrange is always better than range. Range builds the entire list while xrange just returns an iterator.

Use more list comprehensions: change

for x in data[d]:
            combinedList.append(x)

to

combinedList.extend([x for x in data[d]])

see if you can extrapolate these techniques to more of your code.

In general you don't want to allocating memory (making new lists) inside of for loops.

Parsing numeric data from text file using Python

1 Answers1