-1

In R, my dataset consists of 11055 obs. of 12 variables. The observations are grouped into groups of 11 for each individual denoted by their RID. As an example, the first few rows of my dataset look like:

Example data

Where 12, 18 and 9 are individuals.

I would like to check if any individual has listed either "1" or "2" for all 11 of their responses. Similar to these posts but for columns instead of rows:

  1. How can I write an R script to check for straight-lining; i.e., whether, for any given row, all values in a set of columns have the same value

  2. Testing whether values across multiple columns are the same using dplyr

Is there a way to carry out this analysis in r? It doesn't matter in practice how it is achieved i.e. a new variable with True/False or total number etc.


Data

combineddata <- structure(list(
  rid = c(12, 12, 12, 12, 12, 12), 
  design_row = c(23, 24, 25, 26, 27, 28), 
  scenario = c(1, 2, 3, 4, 5, 6), 
  seq = c(6, 5, 3, 10, 11, 9), 
  choice = c(2, 2, 2, 2, 1, 1), 
  qol = c(4, 4, 1, 3, 3, 2), 
  life = c(4, 4, 4, 4, 3, 4), 
  benefit = c(1, 3, 1, 3, 3, 1), 
  drug_a_cert = c(3, 1, 3, 3, 2, 4), 
  drug_a_wait = c(4, 1, 4, 4, 3, 3), 
  drug_b_cert = c(2, 2, 2, 2, 1, 1), 
  drug_b_wait = c(1, 4, 1, 1, 1, 2)), 
  row.names = c(NA, -6L), 
  class = c("tbl_df", "tbl", "data.frame"))
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
Bob_123
  • 19
  • 3
  • It will be easier to help if you can share some example data we can load directly and run, e.g. by running `dput(head(MY_DATA))` and pasting in the output of that. – Jon Spring Aug 09 '23 at 16:25
  • The column you want to check is `CHOICE` by groups of `RID`? And none other column matters? If so, please post the output of `dput(df[1:30, c("RID","CHOICE"])`. – Rui Barradas Aug 09 '23 at 16:31
  • Please add code and data as text ([using code formatting](/editing-help#code)), not images. Images: A) don't allow us to copy-&-paste the code/errors/data for testing; B) don't permit searching based on the code/error/data contents; and [many more reasons](//meta.stackoverflow.com/a/285557). Images should only be used, in addition to text in code format, if having the image adds something significant that is not conveyed by just the text code/error/data. – M-- Aug 09 '23 at 16:34
  • I am also not understanding the expected output. Can you post what should be returned from that data set? – Rui Barradas Aug 09 '23 at 16:40
  • @RuiBarradas > dput(head(combineddata)) structure(list(rid = c(12, 12, 12, 12, 12, 12), design_row = c(23, 24, 25, 26, 27, 28), scenario = c(1, 2, 3, 4, 5, 6), seq = c(6, 5, 3, 10, 11, 9), choice = c(2, 2, 2, 2, 1, 1), qol = c(4, 4, 1, 3, 3, 2), life = c(4, 4, 4, 4, 3, 4), benefit = c(1, 3, 1, 3, 3, 1), drug_a_cert = c(3, 1, 3, 3, 2, 4), drug_a_wait = c(4, 1, 4, 4, 3, 3), drug_b_cert = c(2, 2, 2, 2, 1, 1), drug_b_wait = c(1, 4, 1, 1, 1, 2)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame")) – Bob_123 Aug 09 '23 at 16:42

2 Answers2

2

Something like any of these two dplyr pipes?

suppressPackageStartupMessages(
  library(dplyr)
)

# preferred, more readable
combineddata %>%
  group_by(rid) %>%
  summarise(choice = case_when(
    all(choice == 1) ~ 1,
    all(choice == 2) ~ 2,
    TRUE ~ NA
  ))

# another way
combineddata %>%
  group_by(rid) %>%
  summarise(choice = c(NA, 1, 2)[1L + all(choice == 1) + 2L*all(choice == 2)])
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • I think this is doing what I am looking for but I am unable to view the entire tbl (because there are so many rows). I have tried your suggestion and then print(combineddata), n=1000) but it does not fit in the R console. Is there a better way for me to view? When I run your above code, I only get NA for the top few and when I print 1000 I don't get TRUE or NA? – Bob_123 Aug 09 '23 at 17:14
  • 1
    to view the whole output dataframe, we can pipe the results into `view()` (in Rstudio). This opens a spreadsheet-like tab, @Bob_123 – GuedesBF Aug 09 '23 at 17:16
0

We could use if_all.

See https://www.tidyverse.org/blog/2021/02/dplyr-1-0-4-if-any/

library(dplyr) #> 1.1.0

df %>%
  filter(if_all(everything(), ~ . %in% c(1,2)), .by = RID)
  RID DESIGN_ROW SCENARIO SEQ CHOICE variable1 variable2 variable3 variable4 variable5 variable6 variable7
1   9          1        1   2      1         1         2         2         2         2         1         2

modified data (last row contains only 1 and 2):

structure(list(RID = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
12L, 12L, 12L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 
18L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L), DESIGN_ROW = c(23L, 
24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 12L, 13L, 14L, 
15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 1L, 2L, 3L, 4L, 5L, 6L, 
7L, 8L, 9L, 10L, 11L, 1L), SCENARIO = c(1L, 2L, 3L, 4L, 5L, 6L, 
7L, 8L, 9L, 10L, 11L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 
11L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 1L), SEQ = c(6L, 
5L, 3L, 10L, 11L, 9L, 7L, 8L, 4L, 1L, 2L, 9L, 1L, 3L, 11L, 5L, 
6L, 7L, 4L, 2L, 10L, 8L, 6L, 5L, 3L, 4L, 11L, 9L, 7L, 1L, 2L, 
10L, 8L, 2L), CHOICE = c(2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 
1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 
1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L), variable1 = c(4L, 4L, 1L, 
3L, 3L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 4L, 1L, 1L, 3L, 1L, 1L, 3L, 
3L, 4L, 2L, 2L, 2L, 4L, 1L, 3L, 2L, 4L, 4L, 4L, 4L, 3L, 1L), 
    variable2 = c(4L, 4L, 4L, 4L, 3L, 4L, 4L, 3L, 3L, 4L, 4L, 
    4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 4L, 4L, 4L, 4L, 4L, 3L, 
    3L, 3L, 4L, 4L, 3L, 3L, 4L, 2L), variable3 = c(1L, 3L, 1L, 
    3L, 3L, 1L, 1L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 1L, 2L, 3L, 
    2L, 2L, 1L, 2L, 1L, 1L, 3L, 1L, 2L, 2L, 3L, 2L, 2L, 3L, 3L, 
    2L), variable4 = c(3L, 1L, 3L, 3L, 2L, 4L, 3L, 4L, 2L, 1L, 
    2L, 4L, 2L, 4L, 1L, 3L, 1L, 4L, 4L, 2L, 4L, 3L, 3L, 1L, 3L, 
    2L, 4L, 3L, 3L, 3L, 4L, 1L, 1L, 2L), variable5 = c(4L, 1L, 
    4L, 4L, 3L, 3L, 4L, 3L, 2L, 1L, 2L, 3L, 1L, 4L, 1L, 1L, 2L, 
    4L, 4L, 1L, 3L, 4L, 4L, 1L, 1L, 3L, 3L, 2L, 1L, 2L, 2L, 1L, 
    1L, 2L), variable6 = c(2L, 2L, 2L, 2L, 1L, 1L, 2L, 3L, 3L, 
    3L, 1L, 1L, 3L, 3L, 2L, 4L, 4L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 
    4L, 1L, 3L, 2L, 4L, 1L, 3L, 4L, 3L, 1L), variable7 = c(1L, 
    4L, 1L, 1L, 1L, 2L, 1L, 1L, 3L, 3L, 1L, 2L, 4L, 1L, 4L, 4L, 
    3L, 1L, 1L, 2L, 2L, 2L, 1L, 4L, 4L, 1L, 2L, 1L, 4L, 1L, 1L, 
    3L, 4L, 2L)), class = "data.frame", row.names = c(NA, -34L
))
TarJae
  • 72,363
  • 6
  • 19
  • 66