The following function is used to create a path overview for the following dataset:
tc <- textConnection('
path touchpoint time
abc A 1
abc A 2
abc B 3
abc C 4
def A 2
def B 3
def D 4
def C 5
def D 6
ghi A 1
ghi A 2
ghi A 3
ghi C 4
jkl A 5
jkl A 6
jkl B 7
jkl C 8
mno B 1
mno A 2
mno A 3
mno C 4
pqr A 1
pqr C 2
')
paths <- read.table(tc, header=TRUE)
--
library(plyr)
foo <- function(x){
r <- rle(as.character(x))
short <- paste0(r$values, collapse="_")
long <- paste0(r$values, "(", r$lengths, ")", collapse="_")
data.frame(short, long)
}
ddply(paths, .(path), function(x)foo(x$touchpoint))
path short long
1 abc A_B_C A(2)_B(1)_C(1)
2 def A_B_D_C_D A(1)_B(1)_D(1)_C(1)_D(1)
3 ghi A_C A(3)_C(1)
4 jkl A_B_C A(2)_B(1)_C(1)
5 mno B_A_C B(1)_A(2)_C(1)
6 pqr A_C A(1)_C(1)
Thus this function creates two forms of 'paths':
- Short provides the sequence of touchpoints per path from least recent to most recent.
- Long provides the sequence of touchpoints per path from least recent to most recent including the number of times a touchpoint was involved.
Since the number of touchpoints can be quite large for some paths, I would like to incorporate the following constraint: only select the n
most recent values from short
and long
. Since the paths are constructed from an rle()
object, my question is:
How can I get N
values and their corresponding lengths from an rle() object? Since the paths are saved from least recent touchpoint to most recent touchpoint, the last N
values and corresponding lengths need to be selected. rle()
documentation does not provide a solution for this issue.
Expected outcome if N=2
will be:
path short long
1 abc B_C B(1)_C(1)
2 def C_D C(1)_D(1)
3 ghi A_C A(3)_C(1)
4 jkl B_C B(1)_C(1)
5 mno A_C A(2)_C(1)
6 pqr A_C A(1)_C(1)