I have a data frame where each row represents interaction data per person.
actions = read.table('C:/Users/Desktop/actions.csv', header = F, sep = ',', na.strings = '', stringsAsFactors = F)
Each person can have one, or more of the following interactions:
eat, sleep, walk, jump, hop, wake, run
The action lengths being recorded for each person may differ as below:
P1: eat, sleep, sleep, sleep
P2: wake, walk, eat, walk, walk, jump, jump, run, run
P3: wake, eat, walk, jump, run, sleep
To make the lengths equal, I have NA padding at the end:
P1: eat, sleep, sleep, sleep, NA, NA, NA, NA, NA
P2: wake, walk, eat, walk, walk, jump, jump, run, run
P3: wake, eat, walk, jump, run, sleep, NA, NA, NA
Now, my requirement is to update the per person entries (row wise data), so that no two consecutive entries are duplicates. It is very important to maintain the order. My required output is:
P1: eat, sleep, NA, NA, NA, NA, NA, NA, NA
P2: wake, walk, eat, walk, jump, run, NA, NA, NA
P3: wake, eat, walk, jump, run, sleep, NA, NA, NA
The column names are by default V1, V2, V3 .... Vn where
n = maximum length of interactions string
In the above example P2 has maximum length; so n = 9. So total columns in the above example are from V1-V9.
The output for the
dput(actions)
structure(list(V1 = c("S", "C", "R"), V2 = c("C", "C", "R"),
V3 = c("R", "C", "R"), V4 = c("S", NA, "R"), V5 = c("C",
NA, "R"), V6 = c("R", NA, NA), V7 = c("S", NA, NA), V8 = c("C",
NA, NA), V9 = c("R", NA, NA)), class = "data.frame", row.names = c(NA,-3L))
The following question: Removing Only Adjacent Duplicates in Data Frame in R is bit similar to mine, however, there are several differences. I am unable to solve my problem even by incorporating the code from the above question.
Any suggestions on this would be highly appreciated!