0

My test-data looks like this: (it's graph-like)

elemuid <- c(1, 2, 3, 4, 5, 6, 7)
timestamp <- c("2018-02-10 23:00:00", "2018-02-10 23:01:00", "2018-02-10 22:59:00", "2018-02-10 22:40:00", "2018-02-10 22:39:00", "2018-02-10 22:37:00", "2018-02-10 23:01:00")
from <- c(10, 8, 2, 12, 7, 8, 9)
to <- c(9, 10, 10, 3, 12, 7, 8)
value <- c(56, 26, 60, 50, 90, 80, 50)

df <- data.frame(elemuid, timestamp, from, to, value)
elemuid timestamp   from    to  value
1    2018-02-10 23:00:00 10  9   56
2    2018-02-10 23:01:00 8   10  26
3    2018-02-10 22:59:00 2   10   60
4    2018-02-10 22:40:00 12  3   50
5    2018-02-10 22:39:00 7   12  90
6    2018-02-10 22:37:00 8   7   80
7    2018-02-10 23:01:00 9   8   50

df <- copy_to(sc, df)

As you can see, the spark-dataframe is unsorted. I am looking for some functionality in sparklyr to build pairs withing a 3-minutes-window. So for example rows with ID 3, 1, 7 should build a pair, because they share a valid from-to-realtionship (2 -> 10 -> 9 -> 8) AND are within a time-window of 3 minutes. (perfect example in a messy world)

Nevertheless the result should look like this:

elemuid1    elemuid2    elemuid3    timestamp1  timestamp2  timestamp3  from    to  value1  value2  value3
3   1   7   2018-02-10 22:59:00 2018-02-10 23:00:00 2018-02-10 23:01:00 2   8   60  56  50

That's a really difficult task for me, a sparklyr-beginner. My problem is that I have multiple elemuid at one and the same date. Ordering is for this reason pretty tough. I hope that some can help me. Thank you!

user60856839
  • 133
  • 11

0 Answers0