0

I have a dataframe that I pulled from REDCap, I have imported the CSV in RStudio. unique participant id's are listed and then events. I need to list the number of each repeated event, for example acute_event_infect_arm_4 for 1010002 is the first rejection event and then _4b is the second event, _4c is third and so on.

I need to do this for follow-up, reject and CMV/EBV events

Here is a small snapshot of dataframe with 3 ids (actual df has 1000 ids)

enter image description here

structure(list(id = c(1010002, 1010002, 1010002, 1010002, 1010002, 
1010002, 1010002, 1010002, 1010002, 1010002, 1010002, 1010002, 
1010002, 1010002, 1010002, 1010002, 1010006, 1010006, 1010006, 
1010006, 1010006, 1010006, 1010006, 1010006, 1010006, 1010006, 
1010006, 1010008, 1010008, 1010008, 1010008, 1010008, 1010008, 
1010008, 1010008, 1010008, 1010008, 1010008), redcap_event_name = 
c("pre_transplant_arm_4", 
"transplant_arm_4", "transplant_2_arm_4", "end_of_followup_fo_arm_4", 
"last_encounter_arm_4", "acute_event_reject_arm_4", 
"acute_event_reject_arm_4b", 
"acute_event_infect_arm_4", "acute_event_infect_arm_4b", 
"acute_event_infect_arm_4c", 
"acute_event_infect_arm_4d", "acute_event_infect_arm_4e", 
"acute_event_infect_arm_4f", 
"acute_event_infect_arm_4g", "acute_event_cmvebv_arm_4", 
"acute_event_cmvebv_arm_4b", 
"pre_transplant_arm_4", "transplant_arm_4", "1_month_followup_arm_4", 
"2_year_followup_arm_4", "last_encounter_arm_4", "acute_event_reject_arm_4", 
"acute_event_reject_arm_4b", "acute_event_infect_arm_4", 
"acute_event_infect_arm_4b", 
"acute_event_infect_arm_4c", "acute_event_cmvebv_arm_4", 
"pre_transplant_arm_4", 
"transplant_arm_4", "3_month_followup_arm_4", "6_month_followup_arm_4", 
"1_year_followup_arm_4", "2_year_followup_arm_4", "3_year_followup_arm_4", 
"last_encounter_arm_4", "acute_event_reject_arm_4", 
"acute_event_infect_arm_4", 
"acute_event_cmvebv_arm_4")), row.names = c(NA, -38L), class = c("tbl_df", 
"tbl", "data.frame"))

This is what I need to add in redcap_repeat column

enter image description here

@akrun please some some examples below (bolded red are missing)

enter image description here

wibeasley
  • 5,000
  • 3
  • 34
  • 62
  • 2
    Images are a really bad way of posting data (or code). Can you post sample data in `dput` format? Please edit **the question** with the output of `dput(df)`. Or, if it is too big with the output of `dput(head(df, 20))`. (`df` is the name of your dataset.) – Rui Barradas May 24 '19 at 19:09
  • @RuiBarradas I have edited question as you requested –  May 24 '19 at 19:29
  • Does every event in the full data set end with `_4*` where * is either nothing or a single letter? – Jon Spring May 24 '19 at 19:36
  • The lines with `3_month`, `6_month`, `1_year` etc. will be challenging, since you'll have to detect and split out phrases related to time which are a different dimension than the one you want to count, the event name. – Jon Spring May 24 '19 at 19:38
  • Do you need `df1 %>% group_by(id, grp1 = str_remove(redcap_event_name, "[a-z]$")) %>% mutate(recap_repeat =if(any(str_detect(substring(redcap_event_name, nchar(redcap_event_name)), "[a-z]"))) as.character(row_number()) else "")` – akrun May 24 '19 at 19:43
  • The last set of numbers for 3_month, 6_month, doesn't have the 4b, 4c, etc – akrun May 24 '19 at 19:46
  • @JonSpring Different ids have followup events. its really only the "_followup_arm_" part that I need. I need the number of times "_followup_arm_" for each id to be counted and listed in the redcap_repeat column. –  May 24 '19 at 19:48
  • @JonSpring there are also arms ending in _1, _2, _3 and _5 also. Its the characters in between that I think are key, i.e. "acute_event_reject_", "acute_event_cmvebv_", _followup_arm_", "acute_event_infect_" –  May 24 '19 at 19:52
  • @akrun the followup arms don't have 4b, 4c etc, as the naming convention "1 year, 2 year" etc makes them unique already –  May 24 '19 at 20:19

1 Answers1

1

Here is one option

library(tidyverse)
df1 %>% 
    group_by(id, grp1 = str_remove(redcap_event_name, "[a-z]$|^\\d+_")) %>% 
    mutate(redcap_repeat =if(any(str_detect(redcap_event_name,  "[a-z]$|^[0-9]")) & 
       n() > 1) as.character(row_number()) else "") %>% 
     ungroup %>%
     group_by(id, grp1 = str_remove(redcap_event_name, "^\\d+_(month|year)_")) %>%
     mutate(redcap_repeat = case_when(redcap_repeat != "" & n() > 1 ~ 
          as.character(row_number()),
          TRUE ~ redcap_repeat)) %>% 
     ungroup %>%
     select(-grp1) %>%
     as.data.frame

-output

#        id         redcap_event_name redcap_repeat
#1  1010002      pre_transplant_arm_4              
#2  1010002          transplant_arm_4              
#3  1010002        transplant_2_arm_4              
#4  1010002  end_of_followup_fo_arm_4              
#5  1010002      last_encounter_arm_4              
#6  1010002  acute_event_reject_arm_4             1
#7  1010002 acute_event_reject_arm_4b             2
#8  1010002  acute_event_infect_arm_4             1
#9  1010002 acute_event_infect_arm_4b             2
#10 1010002 acute_event_infect_arm_4c             3
#11 1010002 acute_event_infect_arm_4d             4
#12 1010002 acute_event_infect_arm_4e             5
#13 1010002 acute_event_infect_arm_4f             6
#14 1010002 acute_event_infect_arm_4g             7
#15 1010002  acute_event_cmvebv_arm_4             1
#16 1010002 acute_event_cmvebv_arm_4b             2
#17 1010006      pre_transplant_arm_4              
#18 1010006          transplant_arm_4              
#19 1010006    1_month_followup_arm_4              
#20 1010006     2_year_followup_arm_4              
#21 1010006      last_encounter_arm_4              
#22 1010006  acute_event_reject_arm_4             1
#23 1010006 acute_event_reject_arm_4b             2
#24 1010006  acute_event_infect_arm_4             1
#25 1010006 acute_event_infect_arm_4b             2
#26 1010006 acute_event_infect_arm_4c             3
#27 1010006  acute_event_cmvebv_arm_4              
#28 1010008      pre_transplant_arm_4              
#29 1010008          transplant_arm_4              
#30 1010008    3_month_followup_arm_4             1
#31 1010008    6_month_followup_arm_4             2
#32 1010008     1_year_followup_arm_4             3
#33 1010008     2_year_followup_arm_4             4
#34 1010008     3_year_followup_arm_4             5
#35 1010008      last_encounter_arm_4              
#36 1010008  acute_event_reject_arm_4              
#37 1010008  acute_event_infect_arm_4              
#38 1010008  acute_event_cmvebv_arm_4              
akrun
  • 874,273
  • 37
  • 540
  • 662
  • @JBrowne13 Sorry, I am not getting that error with the package version I have – akrun May 24 '19 at 19:57
  • 1
    @JBrowne13 I used `packageVersion('dplyr')# [1] ‘0.8.0.1’` – akrun May 24 '19 at 20:00
  • 2
    @JBrowne13 May be because you didn't load the package `library(dplyr)` – akrun May 24 '19 at 20:21
  • ok I got it to work, I just noticed however that for id 1010006 the 1_month_followup and 2_year_followup should also have a 1 and 2 respectively in the redcap_repeat column –  May 24 '19 at 20:26
  • 1
    @JBrowne13 My solution is based on the expected output you posted. It didn't had that. – akrun May 24 '19 at 20:27
  • Sorry for continuous comments. I just posted a snapshot of data, actual data has around 1000 different ids with multiple repeating events. –  May 24 '19 at 20:29
  • Sorry, it really I really value your input –  May 24 '19 at 20:33
  • I tried the code on the whole dataframe, it worked perfectly on everything other than followup events. I post some examples of some irregularities. Its strange as some ids have no issue than others do –  May 24 '19 at 20:55