I have a .txt containing a time series data set, formatted in the following manner, as rows separated with \n:
N>New Section
A>1, 2, 3
L>Label_1
G>1, 2, 3
A>3, 2, 1
G>3, 1, 1
A>2, 2, 1
...many rows of G> and A> pairs, of varying number then a new L>Label section...
L>Label_1
G>1, 2, 3
A>3, 2, 1
G>3, 1, 1
A>2, 2, 1
...and sometimes a new N>New section...
N>New Section
A>1, 2, 3
L>Label_2
G>1, 2, 3
A>3, 2, 1
G>3, 1, 1
A>2, 2, 1
...repeated many times.
Each pair of G> and A> rows are a set of x, y, z measurements at a timestamp in a timeseries composed of hundreds of measurements, associated with the L>Label which came before them. There are 6 possible labels.
The desired output format, is a pandas dataframe with the following columns, preserving all the information in the data set.
Time, G_x, G_y, G_z, A_x, A_y, A_z, Label, Label_Measurement,Section
0 1 2 3 3 2 1 Label_1 Label_1_1 1
1 3 1 1 2 2 1 Label_1 Label_1_1 1
0 1 2 3 3 2 1 Label_1 Label_1_2 1
1 3 1 1 2 2 1 Label_1 Label_1_2 1
0 1 2 3 3 2 1 Label_2 Label_2_1 2
1 3 1 1 2 2 1 Label_2 Label_2_1 2
I have a few ideas of elements I think I'm going to need for the solution.
I can make some counters, to count each type of line in the file as they appear:
counter_New_section = 0
counter_Label = 0
counter_A_row = 0
counter_G_row = 0
and I can iterate over each line of the file and use some conditionals to identify what type of line each line is and do the appropriate action with it:
with open(dataset.txt) as dataset:
for line in dataset:
if line.startswith('N>') ... do the appropriate
if line.startswith('L>') ... " "
if line.startswith('G>') ... " "
if line.startswith('A>') ... " "
...I'm struggling. What is the best approach here to achieve my desired dataframe output?
Thanks