My test-data looks like this: (it's graph-like)
elemuid <- c(1, 2, 3, 4, 5, 6, 7)
timestamp <- c("2018-02-10 23:00:00", "2018-02-10 23:01:00", "2018-02-10 22:59:00", "2018-02-10 22:40:00", "2018-02-10 22:39:00", "2018-02-10 22:37:00", "2018-02-10 23:01:00")
from <- c(10, 8, 2, 12, 7, 8, 9)
to <- c(9, 10, 10, 3, 12, 7, 8)
value <- c(56, 26, 60, 50, 90, 80, 50)
df <- data.frame(elemuid, timestamp, from, to, value)
elemuid timestamp from to value
1 2018-02-10 23:00:00 10 9 56
2 2018-02-10 23:01:00 8 10 26
3 2018-02-10 22:59:00 2 10 60
4 2018-02-10 22:40:00 12 3 50
5 2018-02-10 22:39:00 7 12 90
6 2018-02-10 22:37:00 8 7 80
7 2018-02-10 23:01:00 9 8 50
df <- copy_to(sc, df)
As you can see, the spark-dataframe is unsorted. I am looking for some functionality in sparklyr to build pairs withing a 3-minutes-window. So for example rows with ID 3, 1, 7 should build a pair, because they share a valid from-to-realtionship (2 -> 10 -> 9 -> 8) AND are within a time-window of 3 minutes. (perfect example in a messy world)
Nevertheless the result should look like this:
elemuid1 elemuid2 elemuid3 timestamp1 timestamp2 timestamp3 from to value1 value2 value3
3 1 7 2018-02-10 22:59:00 2018-02-10 23:00:00 2018-02-10 23:01:00 2 8 60 56 50
That's a really difficult task for me, a sparklyr-beginner. My problem is that I have multiple elemuid at one and the same date. Ordering is for this reason pretty tough. I hope that some can help me. Thank you!