I'm working on a project where I want to parse a text file using Python. The file consists of some data entry in formats of blocks that vary. A new entry is found when there is a new line. This is what I would like to accomplish:
- Skip the first few lines (first 16 lines)
- After the 16th line, there is a line break that starts the new data entry
- Read the following lines until a new line break is hit. Each individual line is appended to a list called data.
- The list will be passed to a function that handles further processing.
- Repeat step 3 and 4 until there is no more data in the file
Here is an example of the file:
Header Info
More Header Info
Line1
Line2
Line3
Line4
Line5
Line6
Line7
Line8
Line9
Line10
Line11
Line12
Line13
MoreInfo MoreInfo MoreInfo MoreInfo MoreInfo
MoreInfo2 MoreInfo2 MoreInfo2 MoreInfo2 MoreInfo2 MoreInfo2
MoreInfo3 MoreInfo3 MoreInfo3 MoreInfo3 MoreInfo3
MoreInfo4 MoreInfo4
FieldName1 0001 0001
FieldName1 0002 0002
FieldName1 0003 0003
FieldName1 0004 0004
FieldName1 0005 0005
FieldName2 0001 0001
FieldName3 0001 0001
FieldName4 0001 0001
FieldName5 0001 0001
FieldName6 0001 0001
MoreInfo MoreInfo MoreInfo MoreInfo MoreInfo
MoreInfo2 MoreInfo2 MoreInfo2 MoreInfo2 MoreInfo2 MoreInfo2
MoreInfo3 MoreInfo3 MoreInfo3 MoreInfo3 MoreInfo3
MoreInfo4 MoreInfo4
FieldName1 0001 0001
FieldName1 0002 0002
FieldName1 0003 0003
FieldName1 0004 0004
FieldName1 0005 0005
FieldName2 0001 0001
FieldName3 0001 0001
FieldName4 0001 0001
FieldName5 0001 0001
FieldName6 0001 0001
Here is some code I've worked on. It is able to read the first block and append it to a list:
with open(loc, 'r') as f:
for i in range(16):
f.readline()
data = []
line = f.readline()
if line == "\n":
dataLine = f.readline()
while dataLine != "\n":
data.append(dataLine)
dataLine = f.readline()
#pass data list to function
function_call(data)
# reset data list here?
data = []
How do I make it so that it works for the full file? My assumption was that using "with open", it acted as a "while not end of file". I tried adding a "while True" after skipping the first 16 lines. I have little knowledge of Python's parsing capabilities.
Thank you in advanced for any help.