I am having a input dataset, see an input sample bellow, and I want to downsample it. To do so I am using
resample_time=25
init_len = len(df.index)
df = df.set_index('time', drop=False).resample('{}S'.format(resample_time)).last().dropna()
df.index = range(0, len(df.index))
A sample of the output can be found below. However, the output I am getting is not the expected, see expected output bellow. That is to keep a row every 25 seconds. Could you please someone explain why is that happening and how we can fix it?
Input:
lon lat time
0 116.317117 40.075417 2007-05-06 04:21:12
1 116.317067 40.075217 2007-05-06 04:21:33
2 116.317233 40.075250 2007-05-06 04:21:53
3 116.317217 40.075417 2007-05-06 04:22:04
4 116.317133 40.075567 2007-05-06 04:22:23
5 116.317167 40.075400 2007-05-06 04:46:48
6 116.317233 40.075183 2007-05-06 04:46:54
7 116.317050 40.074933 2007-05-06 04:47:00
8 116.313567 40.073983 2007-05-06 04:47:36
9 116.311133 40.073167 2007-05-06 04:48:44
10 116.308017 40.072300 2007-05-06 04:49:15
11 116.307467 40.072483 2007-05-06 04:49:22
12 116.306250 40.074017 2007-05-06 04:49:45
13 116.306450 40.074283 2007-05-06 04:49:52
Output:
lon lat time
0 116.317117 40.075417 2007-05-06 04:21:12
1 116.317067 40.075217 2007-05-06 04:21:33
2 116.317217 40.075417 2007-05-06 04:22:04
3 116.317133 40.075567 2007-05-06 04:22:23
4 116.317050 40.074933 2007-05-06 04:47:00
5 116.313567 40.073983 2007-05-06 04:47:36
6 116.311133 40.073167 2007-05-06 04:48:44
7 116.307467 40.072483 2007-05-06 04:49:22
8 116.306450 40.074283 2007-05-06 04:49:52
9 116.308567 40.071850 2007-05-06 04:50:30
10 116.308667 40.071650 2007-05-06 04:50:57
11 116.310450 40.068850 2007-05-06 04:51:38
12 116.311800 40.067717 2007-05-06 04:52:02
13 116.312300 40.067067 2007-05-06 04:52:21
14 116.312667 40.066617 2007-05-06 04:52:32
15 116.312800 40.066450 2007-05-06 04:53:05
16 116.314067 40.064867 2007-05-06 04:53:38
17 116.314783 40.063667 2007-05-06 04:54:14
18 116.315867 40.062167 2007-05-06 04:54:41
19 116.318550 40.058583 2007-05-06 04:55:20
Expected output:
lon lat time
0 116.317117 40.075417 2007-05-06 04:21:12 -> Include
1 116.317067 40.075217 2007-05-06 04:21:33 -> Exclude
2 116.317233 40.075250 2007-05-06 04:21:53 -> Include
3 116.317217 40.075417 2007-05-06 04:22:04 -> Exclude
4 116.317133 40.075567 2007-05-06 04:22:23 -> Include
5 116.317167 40.075400 2007-05-06 04:46:48 -> Include
6 116.317233 40.075183 2007-05-06 04:46:54 -> Exclude
7 116.317050 40.074933 2007-05-06 04:47:00 -> Exclude
8 116.313567 40.073983 2007-05-06 04:47:36 -> Include
9 116.311133 40.073167 2007-05-06 04:48:44 -> Exclude
10 116.308017 40.072300 2007-05-06 04:49:15 -> Include
11 116.307467 40.072483 2007-05-06 04:49:22
12 116.306250 40.074017 2007-05-06 04:49:45
13 116.306450 40.074283 2007-05-06 04:49:52
PS: You can read the .csv file in the link using pd.read_csv(' 20070506033305.csv'), parse_dates=['time'])