So I have a dataframe that looks like this :
id epoch value duration
958 1819 2018-01-01 00:00:00.000 1 20
959 1820 2018-01-01 00:20:00.000 2 20
960 1821 2018-01-01 00:40:00.000 3 20
961 1822 2018-01-01 01:00:00.000 4 20
962 1823 2018-01-01 01:20:00.000 5 20
963 1824 2018-01-01 01:20:01.000 5.05 0.01
964 1825 2018-01-01 01:40:01.000 6 20
965 1826 2018-01-01 02:00:01.000 7 20
966 1827 2018-01-01 02:00:02.000 7.0012 0.01
967 1828 2018-01-01 02:20:02.000 8 20
So as you can see we have values that are 3-periodic, and i want to numerotate the periods in a new column by ignoring the 'outliers' that have a very short duration (but not removing the line).
Here's what I have :
id epoch value duration period
958 1819 2018-01-01 00:00:00.000 1 20 1
959 1820 2018-01-01 00:20:00.000 2 20 2
960 1821 2018-01-01 00:40:00.000 3 20 3
961 1822 2018-01-01 01:00:00.000 4 20 1
962 1823 2018-01-01 01:20:00.000 5 20 2
963 1824 2018-01-01 01:20:01.000 5.05 0.01 3
964 1825 2018-01-01 01:40:00.000 6 20 1
965 1826 2018-01-01 02:00:01.000 7 20 2
966 1827 2018-01-01 02:00:02.000 7.0012 0.01 3
967 1828 2018-01-01 02:20:02.000 8 20 1
And here's what I want :
id epoch value duration period
958 1819 2018-01-01 00:00:00.000 1 20 1
959 1820 2018-01-01 00:20:00.000 2 20 2
960 1821 2018-01-01 00:40:00.000 3 20 3
961 1822 2018-01-01 01:00:00.000 4 20 1
962 1823 2018-01-01 01:20:00.000 5 20 2
963 1824 2018-01-01 01:20:01.000 5.05 0.01 2
964 1825 2018-01-01 01:40:00.000 6 20 3
965 1826 2018-01-01 02:00:01.000 7 20 1
966 1827 2018-01-01 02:00:02.000 7.0012 0.01 1
967 1828 2018-01-01 02:20:02.000 8 20 2
I have already done this with 2 for loops but since the dataframe is large, I am searching for a faster way to do it.
Thank in advance
Edit : I added few more lines. To be clearer : some points are "duplicated" (they have nearly the same value as the previous one) si I need to put them in the same period as its double. Also, I can't remove them (maybe temporarily ?), I need to have them in the final dataframe.