I'm working with dump files from simulations with the software Lammps, and the data files I get have nine lines of info for each timestep, which does not contain in data, but just informations. Therefore, I want to find a way to delete these lines, that are there for every timestep of data, s.t. I only have the data in a seperate file. Below I have shown the start of each timestep in the data, which I want deleted.
ITEM: TIMESTEP
0
ITEM: NUMBER OF ATOMS
4200
ITEM: BOX BOUNDS pp pp pp
-2.0000000000000000e+01 2.0000000000000000e+01
-2.0000000000000000e+01 2.0000000000000000e+01
-2.0000000000000000e+01 2.0000000000000000e+01
ITEM: ATOMS id mol xu yu zu
533 26 -17.891 -16.7503 -18.8102
534 26 -17.7164 -17.5276 -18.7004
535 26 -17.3612 -17.7508 -19.2693
536 26 -17.0213 -17.8009 -18.5118
537 26 -17.8409 -18.5307 -18.8511
538 26 -17.7968 -19.5713 -18.6246
ITEM: TIMESTEP
1
ITEM: NUMBER OF ATOMS
4200
ITEM: BOX BOUNDS pp pp pp
-2.0000000000000000e+01 2.0000000000000000e+01
-2.0000000000000000e+01 2.0000000000000000e+01
-2.0000000000000000e+01 2.0000000000000000e+01
ITEM: ATOMS id mol xu yu zu
536 26 -17.0213 -17.8009 -18.5118
537 26 -17.8409 -18.5307 -18.8511
538 26 -17.7968 -19.5713 -18.6246
Which is continued for the number of timesteps i have run in the simulations. And the number of data points are also longer than shown.
Right now I have a code that does what I want, which can be seen below. However, I want to ask if anybody have some ideas or inputs to make it faster, since I am still a rather new Python user.
def data_process_func(filename, n_atoms, k):
with open(filename, 'r') as f:
lines = f.readlines()
# The following loop deletes all the text only leaving data
for i in range(len(timestep)):
del lines[(n_atoms)*i:(n_atoms*i)+9]
# Saves the data without the text to a txt file
with open('data_{}.txt'.format(k), 'w') as f:
f.writelines(lines)
# Loads the data from the file into a dataframe
data = pd.read_csv('data_{}.txt'.format(k), sep=" ", header = None, names = ['id', 'mol', 'xu', 'yu', 'zu'])
return data