0

I am working with a data set with multiple questionnaires which were supposed to be filled in on different timepoints i.e.

173       9/13/2013     10/29/2013      9/26/2014
174      10/21/2013     11/25/2013      11/3/2014
175        7/1/2014       7/3/2015      4/27/2016
176       1/15/2014      2/24/2014      6/10/2015
177       3/15/2014                      4/1/2015
178       7/18/2014      9/18/2014      8/17/2015
179       6/30/2013      8/15/2013      7/15/2014
180       4/22/2013      6/24/2013      5/11/2014
181       12/7/2014                    12/26/2015
182        4/2/2015      5/17/2015      4/20/2016
183       1/12/2015      2/26/2015      1/28/2016
184       7/18/2014      8/26/2014      8/14/2015
185       8/27/2013     10/19/2013      9/21/2014
186      10/29/2013     11/30/2013      11/6/2014
187       9/17/2014     11/18/2014     10/20/2015
188       5/10/2014      6/27/2014       6/1/2015
189       10/4/2013                     10/5/2014
190       1/22/2013      4/11/2013               
191      10/21/2014     10/21/2014               

I would like to know how to see how many participants filled in all questionnaires on the same day, how many participants filled in at least 2 questionnaires on the same day. how many at least 3 on the same day etc. Any help would be highly appreciated.

Reproducible data:

    Label = c( 
    "1/25/2015", "1/25/2016", "1/26/2014", "1/26/2015", "1/27/2014", 
    "1/27/2015", "1/28/2014", "1/28/2015", "1/29/2015", "1/3/2014", 
    "1/3/2015", "1/3/2016", "1/30/2015", "1/31/2014", "1/4/2014", 
    "1/4/2015", "1/4/2016", "1/5/2014", "1/5/2015", "1/6/2014", 
    "1/6/2015", "1/7/2014", "1/7/2015", "1/8/2014", "1/8/2015", 
    "1/9/2014", "1/9/2015", "1/9/2016", "10/1/2012", "10/1/2013", 
    "10/1/2014", "10/1/2015", "10/10/2013", "10/10/2014", "10/11/2013", 
    "10/11/2014", "10/11/2015", "10/12/2013", "10/12/2014", "10/12/2015", 
    "10/13/2013", "10/13/2014", "10/13/2015", "10/14/2013", "10/14/2014", 
    "10/14/2015", "10/15/2014", "10/15/2015", "10/16/2013", "10/16/2014", 
    "10/16/2015", "10/17/2013", "10/17/2014", "10/17/2015", "10/18/2013", 
    "10/18/2014", "10/18/2015", "10/19/2013", "10/19/2014", "10/19/2015", 
    "10/2/2013", "10/2/2014", "10/20/2013", "10/20/2014", "10/20/2015", 
    "10/21/2013", "10/21/2014", "10/22/2013", "10/22/2014", "10/22/2015", 
    "10/23/2012", "10/23/2013", "10/23/2014", "10/23/2015", "10/24/2013", 
    "10/24/2014", "10/24/2015", "10/25/2013", "10/25/2014", "10/26/2013", 
    "10/26/2014", "10/26/2015", "10/27/2013", "10/27/2014", "10/27/2015", 
    "10/28/2013", "10/28/2014", "10/29/2013", "10/29/2014", "10/3/2014", 
    "10/3/2015", "10/30/2014", "10/31/2012", "10/31/2013", "10/31/2014", 
    "10/31/2015", "10/4/2013", "10/4/2014", "10/4/2015", "10/5/2014", 
    "10/5/2015", "10/6/2013", "10/6/2014", "10/6/2015", "10/7/2013", 
    "10/7/2014", "10/8/2012", "10/8/2014", "10/8/2015", "10/9/2013", 
    "10/9/2014", "10/9/2015", "11/1/2013", "11/1/2014", "11/1/2015", 
    class = "factor")

Label = c(
    "4/6/2015", "4/7/2015", "4/9/2012", "5/12/2015", "5/13/2014", 
    "5/14/2015", "5/15/2014", "5/15/2015", "5/17/2014", "5/19/2014", 
    "5/20/2014", "5/25/2014", "5/27/2014", "5/29/2014", "5/30/2014", 
    "5/30/2015", "5/31/2015", "5/4/2014", "5/9/2015", "6/1/2015", 
    "6/10/2014", "6/11/2014", "6/11/2015", "6/12/2015", "6/16/2014", 
    "6/16/2015", "6/18/2014", "6/21/2014", "6/24/2015", "6/25/2014", 
    "6/25/2015", "6/26/2015", "6/27/2015", "6/29/2015", "6/5/2014", 
    "6/6/2015", "6/8/2014", "7/1/2014", "7/13/2014", "7/14/2015", 
    "7/16/2014", "7/2/2014", "7/21/2014", "7/25/2014", "7/27/2014", 
    "7/27/2015", "7/28/2014", "7/29/2014", "7/30/2014", "7/31/2014", 
    "7/31/2015", "7/4/2014", "7/4/2015", "8/1/2014", "8/11/2014", 
    "8/11/2015", "8/25/2014", "8/27/2015", "8/5/2014", "8/8/2014", 
    "8/9/2015", "9/1/2014", "9/10/2015", "9/15/2015", "9/22/2013", 
    "9/3/2012", "9/30/2014", "9/8/2014", "9/8/2015"), class = "factor")

Label = c(" ", 
    "1/16/2016", "1/26/2015", "10/11/2015", "10/14/2015", "10/16/2015", 
    "10/6/2014", "10/7/2013", "11/11/2015", "11/15/2015", "11/17/2013", 
    "11/18/2013", "11/2/2015", "11/20/2013", "11/29/2013", "2/17/2014", 
    "2/17/2015", "2/21/2015", "2/23/2014", "2/25/2014", "2/25/2015", 
    "3/11/2016", "3/2/2014", "3/22/2015", "3/4/2014", "3/4/2016", 
    "4/11/2014", "4/12/2013", "4/18/2016", "4/21/2015", "4/23/2015", 
    "4/29/2015", "4/3/2015", "4/5/2016", "5/23/2015", "5/26/2015", 
    "5/27/2015", "5/28/2015", "5/29/2014", "5/29/2015", "5/8/2015", 
    "6/16/2015", "6/22/2015", "6/28/2015", "7/24/2015", "7/27/2015", 
    "7/4/2014", "7/8/2015", "9/14/2015", "9/15/2015", "9/16/2014", 
    "9/17/2014", "9/22/2014", "9/23/2014", "9/24/2014", "9/24/2015", 
    "9/26/2014", "9/28/2015", "9/30/2015", "9/9/2015"), class = "factor")), .Names = c("1A_RespDate", 
"1B_RespDate", "1C_1_RespDate", "1C_2_RespDate", 
"1C_RespDate", "2A_1_RespDate", "2A_RespDate", "2B_RespDate", 
"2C_RespDate"), row.names = c(NA, -4831L), class = "data.frame")
Z.Chanell
  • 35
  • 8

1 Answers1

0

I'll call you dataframe df:

sapply(apply(df,1,unique),length)

will give you the number of unique dates for each individual as a vector. The highest value is 7 and the min 1 (all questionnaires answered on the same day).

which(sapply(apply(df,1,unique),length)<7)

Will give you the index of the individuals who filled at least 2 questionnaires on the same day.

length(which(sapply(apply(df,1,unique),length)<7))

Will tell you how many individuals filled at least 2 questionnaires on the same day.

Edit: This is inelegant (there must be a cleaner way) but it seems to work

which(sapply(sapply(sapply(apply(df,1,table),function(x) x==Z),which),function(x) any(x>0)))

Z is to be set to the number of questionnaires filled on the same day.
Explaination:

apply(df,1,table)

gives a list with for each individual the unique dates and how many times they appear.

sapply(apply(df,1,table),function(x) x==Z)

will give you the same list with True/False on whether a date appears exactly Z times.

sapply(sapply(apply(df,1,table),function(x) x==Z),which)

will give either "interger(0)" or a positive integer which is the index of the date for the individual (it's not something we are interested in).

sapply(sapply(sapply(apply(df,1,table),function(x) x==Z),which),function(x) any(x>0))

will give a vector of True/False corresponding to the index of the individual then next step with "which" is to get the index for the True.
We therefore get the individuals for which a date appears exactly Z times.

Haboryme
  • 4,611
  • 2
  • 18
  • 21
  • Thank you, this works fine ! If I want to see whether 3 questionnaires were filled in the same day should I do it this way: length(which(sapply(apply(df,1,unique),length)==3)) or length(which(sapply(apply(df,1,unique),length)<3)) ?? Both options give me a different answer. – Z.Chanell Sep 09 '16 at 11:03
  • I'll edit my answer with an answer to that question. – Haboryme Sep 09 '16 at 11:26
  • If your question is fully answered, accept the answer please. – Haboryme Sep 12 '16 at 08:38
  • Below the up and down arrow, there should be tick mark or something. As for dummy variable (if I understood correctly), you can create new columns one for 2 questionnaires filled the same day (`df$twoquest<-0`) and so on. And then fill it with : `df$twoquest[which(sapply(sapply(sapply(apply(df,1,table),function(x) x==2),which),function(x) any(x>0)))]<-1` – Haboryme Sep 12 '16 at 09:22
  • Hi Haboryme, I tried your suggestion for the dummy vairables but it is giving me weird answers i.e. it is showing a dummy variables for 7questionnaires filled in the same day for someone who only has 4 questionnaires. I also tried : Respdate$ninequest[length(which(sapply(apply(Respdate,1,unique),length)==9))] <-1 This doesn't work either, any ideas? – Z.Chanell Sep 14 '16 at 08:46
  • Yes, when the which() returns integer(0) this gives false results. df$threequest[ ifelse( length(which(sapply(sapply(sapply(apply(df,1,table),function(x) x==3) ,which),function(x) any(x>0))))==0, 0, which(sapply(sapply(sapply(apply(df,1,table),function(x) x==3),which),function(x) any(x>0))))]<-1 Seems to be a way arounnd it. – Haboryme Sep 14 '16 at 09:08
  • This is also giving me false answers. I don't know why – Z.Chanell Sep 14 '16 at 09:35
  • Could you provide your data? So far I've only tested with data I generated myself. – Haboryme Sep 14 '16 at 09:44
  • The original post contains data that is not reproducible (separators of varying length). – Haboryme Sep 14 '16 at 09:58
  • copy paste what dput(yourdataframe) gives. – Haboryme Sep 14 '16 at 11:02
  • dput is the way to go. I can't get a dataframe with this in the clipboard. – Haboryme Sep 14 '16 at 11:18
  • remove/change what is explicit and identify the data, like id, col names and the like. And you may very well provide only a sample but still, use dput(). – Haboryme Sep 14 '16 at 11:24
  • I hope this is what you mean (see original post) – Z.Chanell Sep 14 '16 at 11:36
  • This is incomplete, there are open parenthesis. It should start with structure. – Haboryme Sep 14 '16 at 12:09
  • my dataset is too big, I can't scroll up higher to get the complete dput() in the console – Z.Chanell Sep 14 '16 at 12:19
  • you could get a subset like df[1:50,] and dput that. – Haboryme Sep 14 '16 at 12:54