2

I am trying to explore the response change patterns for particular questions. Here is an example of dataset.

id <- c(1,1,1, 2,2,2, 3,3,3,3, 4,4)
item.id <- c(1,1,1, 1,1,1 ,1,1,2,2, 1,1)
sequence <- c(1,2,3, 1,2,3, 1,2,1,2, 1,2)
score <- c(0,0,0, 0,0,1, 0,1,0,0, 1,0)
data <- data.frame("id"=id, "item.id"=item.id, "sequence"=sequence, "score"=score)
data
   id item.id sequence score
1   1       1        1     0
2   1       1        2     0
3   1       1        3     0
4   2       1        1     0
5   2       1        2     0
6   2       1        3     1
7   3       1        1     0
8   3       1        2     1
9   3       2        1     0
10  3       2        2     0
11  4       1        1     1
12  4       1        2     0

id represents persons, item.id is for questions. sequence is for the attempt to change the response, and the score is the score of the item.

What I am trying to observe is to subset those whose score were changed from 0 to 1 and 1 to 0. The desired outputs would be:

data.0.to.1
   id item.id sequence score
   2       1        1     0
   2       1        2     0
   2       1        3     1
   3       1        1     0
   3       1        2     1


data.1.to.0
    id item.id sequence score
    4       1        1     1
    4       1        2     0

Any thoughts? Thanks!

amisos55
  • 1,913
  • 1
  • 10
  • 21

2 Answers2

2

Here is one option by taking the difference of 'score' grouped by 'id', 'item.id'

library(dplyr)
data %>% 
    group_by(id, item.id) %>%
    filter(any(score != 0)) %>%
    mutate(ind = c(0, diff(score))) %>% 
    group_by(ind =  ind[ind!=0][1]) %>% 
    group_split(ind, keep = FALSE)
#[[1]]
# A tibble: 2 x 4
#     id item.id sequence score
#  <dbl>   <dbl>    <dbl> <dbl>
#1     4       1        1     1
#2     4       1        2     0

#[[2]]
# A tibble: 5 x 4
#     id item.id sequence score
#  <dbl>   <dbl>    <dbl> <dbl>
#1     2       1        1     0
#2     2       1        2     0
#3     2       1        3     1
#4     3       1        1     0
#5     3       1        2     1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks for your reply. Also, is there a way to observe this difference for only the last 2 score attempts rather than any attempt? – amisos55 Oct 22 '19 at 19:36
  • @amisos55. In that case may be you need after the `filter` step, `mutate(ind = c(0, diff(tail(score, 2)))) %>%` – akrun Oct 23 '19 at 05:11
  • when I add this after filter, it gives this error ``Error in mutate_impl(.data, dots) : Column `ind` must be length 4 (the group size) or one, not 2`` – amisos55 Oct 23 '19 at 13:54
  • @amisos55 Sorry, the `c(0,` should be removed as `diff` returns a length one less than the length of the original vector – akrun Oct 23 '19 at 17:38
2

I'd do this:

library(dplyr)
data.0.to.1 = data %>%
  group_by(id, item.id) %>%
  filter(any(diff(score) > 0))

data.1.to.0 = data %>%
  group_by(id, item.id) %>%
  filter(any(diff(score) < 0))
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294