How to use Pandas to split timestamped CSV data into multiple CSVs based on values and continuous time periods

Question

I am trying to analyse a ships AIS data. I have a CSV with ~20,000 rows, with columns for lat / long / speed / time stamp.

I have loaded the data in a pandas data frame, in a Jupyter notebook.

What I want to do is split the CSV into smaller CSVs based on the time stamp and the speed, so I want an individual CSV for each period of time the vessel speed was less than say 2 knots, eg if the vessel transited at 10 knots for 6hrs, then slowed down to 1 knot for a period of 3 hrs, sped back up 10 knots, then slowed down again to 1 knot for a period of 4 hrs, I would want to the output to be two CSVs, one for the 3hr period and one for the 4hr period. This is so I can review these periods individually in my mapping software.

I can filter the data easily to show all the periods where it is <1 knot but I can't break it down to output the continuous periods as separate CSVs / data frames. EDIT

Here is an example of the data

I've tried to show more clearly what I want to achieve here

Please, include some sample data and also code you already have. — ex4, Apr 30 '20 at 10:33
Would help is you can give an example of the CSV. Probably you need to convert the timestamps to a Python `datetime` object after that if would be straightforward to sort and select based on the time and speed. — Bruno Vermeulen, Apr 30 '20 at 10:33

jon · Answer 1 · 2020-04-30T13:04:38.750

0

Here is something to maybe get you started.

First filter out all values that meets the criteria (for example below 2):

df = pd.DataFrame({'speed':[2,1,4,5,4,1,1,1,3,4,5,6], 'time':[4,5,6,7,8,9,10,11,12,13,14,15]})
df_below2 = df[df['speed']<=2].reset_index(drop=True)

Now we need to split the frame if there is too long gap btw values in time. For example:

threshold = 2
df_below2['not_continuous']  = df_below2['time'].diff() > threshold

Distinguish between the groups using cums:

df_below2['group_id'] = df_below2['not_continuous'].cumsum()

From here it should be easy to split the frame based on the group id.

edited Apr 30 '20 at 13:04

answered Apr 30 '20 at 10:34

jon

35
9

When I follow the example in the link it shows me how to filter based on values and out put as a new CSV, I can't see how it lets me make the multiple CSVs based on the continuous data though? – Mhargr Apr 30 '20 at 11:06
For continuous data use for example, df2 = df.loc[(df['column_name'] >= A)]. df2 contains all rows in df where 'column_name' weakly exceeds A. – jon Apr 30 '20 at 11:15
I think this only lets me create a single df where say the speed is < 2 for the whole data set. What I want is each continuous period that the speed is below <2 as a separate CSV. My data set samples the speed every few minute and the data set covers nearly six months of time. – Mhargr Apr 30 '20 at 11:27
Thank you, I am playing around with this idea, not cracked it yet though :) – Mhargr Apr 30 '20 at 12:41

How to use Pandas to split timestamped CSV data into multiple CSVs based on values and continuous time periods

1 Answers1