This problem is very similar to Consecutive group number in R, but I think this problem is not the same problem, but a much harder one.
I am currently dealing with a car data. We recorded the speed of the car every 5 minutes, and it contains a lot of zero values. I want to add a new column where a consecutive number of k or more than k zero speeds are numbered as 0, while other sections are numbered (starting from 1). Let's take a sample data as example:
sample <- data.frame(
id = 1:15,
speed = c(50, 0, 0, 0, 50, 40, 0, 0, 25, 30, 50, 0, 30, 50, 40))
Specifically for this sample data, let's say k equals 2, then my desired result should be like this:
id speed number
1 1 50 1
2 2 0 0
3 3 0 0
4 4 0 0
5 5 50 2
6 6 40 2
7 7 0 0
8 8 0 0
9 9 25 3
10 10 30 3
11 11 50 3
12 12 0 3** <- here is the difference
13 13 30 3
14 14 50 3
15 15 40 3
There are more than 1 million rows in my data, so I hope that the solution could be acceptable in speed.
The reason for setting a threshold "k" is that some drivers just leave their GPS open even if they lock the car and go to sleep. But in other occasions, where the interval is less than k, they just stopped because of the crossroad light. I want to focus on the longtime stops and just ignore the short time stops.
Hope my question makes sense to you. Thank you.