I am attempting build a database from a numeric model output text file. The text file has four (4) rows of title block data followed by many rows (41,149) of data blocks which are each seperated by the word 'INTERNAL' followed by some numeric data as shown below:
Line1: Title block
Line2: Title block
Line3: Title block
Line4: Title block
Line5: INTERNAL 1.0 (10E16.9) -1
Line6: data data data data
Line7: data data data data
Line8 to Line25: data data data data
Line26: data data data data
Line27: INTERNAL 1.0 (10E16.9) -1
Line28: data data data data
..etc all the way down to line 41,149
The data blocks are not of consistent size (i.e., some have more rows of data than others). Thanks to a lot of help from this site, I have been able to take the 41,149 rows of data and organize each data block into seperate lists that I can parse through and build the database from. My problem is that this operation takes a very long time run. I was hoping someone could look at the code I have below and give me suggestions on how I might be able to run it more efficiently. I can attach the model output file if needed. Thanks!
inFile = 'CONFINED_AQIFER.DIS'
strings = ['INTERNAL']
rowList = []
#Create a list of each row number where a data block begins
with open(inFile) as myFile:
for num, line in enumerate(myFile, 1):
if any(s in line for s in strings):
rowList.append(num)
#Function to get line data from row number
def getlineno(filename, lineno):
if lineno < 1:
raise TypeError("First line is line 1")
f = open(filename)
lines_read = 0
while 1:
lines = f.readlines(100000)
if not lines:
return None
if lines_read + len(lines) >= lineno:
return lines[lineno-lines_read-1]
lines_read += len(lines)
#Organize each data block into a unique list and append to a final list (fList)
fList = []
for row in range(len(rowList[1:])):
combinedList = []
i = rowList[row]
data = []
while i < rowList[row+1]:
line = getlineno(inFile, i)
data.append(line.split())
i+=1
for d in range(len(data))[1:]:
for x in data[d]:
combinedList.append(x)
fList.append(combinedList)