Suppose I have a text file like
asdfsa
fasdf
asdf
-1
2412
asdf
fddsfw
efe
st
-1
ghhgg
I need a way to export the entire chunk between 2412 and -1 in an efficient way, perhaps export the chunk as a dataframe that I can do transformations on later. Notice that before 2412, there is also a -1 that I need to control for, so the trigger to copy always begins with -1 then followed immediately by 2412 and ends with another -1. I was building the dataframe like this:
# Build df while looping through text file
df = pd.DataFrame(columns = ['ID', 'string'])
i = 1
with open (PATH, 'rt') as file:
lines = file.readlines()
for index, line in enumerate(lines):
if line.strip('\r\n').strip(' ') == '-1':
if lines[index + 1].strip('\r\n').strip(' ') == '2412':
while lines[index + i+1].strip('\r\n').strip(' ') != '-1':
transformed_strings= do_transforms_with_multiple lines() #some transformation function on lines
df = df.append(transformed_strings) #append transforms here to df
i = i + 1 # go to next line
break # break out of original for loop when next -1 is reached
You can see I'm trying to build a dataframe by looping line by line once I see -1, 2412 and then stop at the next -1. This works quick for small files, but for larger ones it is much too slow. I'm hoping I can export the whole chunk between 2412 and -1 somehow, then apply pd.DataFrame() and my transformations afterwards to speed things up. I found this post here but it doesn't seem to get me what I want. Exporting simply as a txt file would also be fine. I could pull in the txt file later with pd and do my transforms, so appending to a df is not necessary.
Something like
df = pd.DataFrame(columns = ['ID', 'string'])
i = 1
with open (PATH, 'rt') as file:
lines = file.readlines()
for index, line in enumerate(lines):
if line.strip('\r\n').strip(' ') == '-1':
if lines[index + 1].strip('\r\n').strip(' ') == '2412':
while lines[index + i+1].strip('\r\n').strip(' ') != '-1':
write_line_to_txt_file() #OR
df = df.append(line)
i = i + 1 # go to next line
break # break out of original for loop when next -1 is reached
Would also be a solution Thanks for the help!