2

I have a large data frame for which I want to be able to plot a histogram of the number of viable sessions -- indicated by any value other than NA -- for all participants in order to determine whether they are completing a sufficient number of sessions to be included in analysis and whether there is a clear cut-off point for how many sessions most people have scores for. Each participant should complete 10 sessions for the study, but some have missing sessions, indicated by NA.

The real data frame is large and contains participant data I can't show, but I've recreated a smaller sample version here that has the essential columns which are the participant ID, the score they got during each session, and the session number.

Image of sample data frame

Code to recreate the data frame:

dat <- cbind(c(rep(1:3,10)), c(rep(c(12, 32, NA, 44, 45, NA, NA, 8, 54, NA, NA, 12, 13, 14, NA),2)), c(rep(1,3), rep(2,3), rep(3,3), rep(4,3), rep(5,3), rep(6,3), rep(7,3), rep(8,3), rep(9,3), rep(10,3)))
colnames(dat) <- c("ID", "score", "session.num")

Thank you in advance for your help. Please let me know if my question requires clarification.

Adam Quek
  • 6,973
  • 1
  • 17
  • 23
kdestasio
  • 69
  • 1
  • 9

2 Answers2

0

If I understand your question correctly, you want a histogram of how many sessions participants have completed. To do that, you will first need to aggregate your data by ID to see how many viable sessions each participant has completed, then plot the histogram.

dat <- as.data.frame(dat)

dat.agg <- with(dat[!is.na(dat$score),], # Filter out sessions with NA score 
                aggregate(session.num, by = list(ID), # Aggregate session by ID
                          FUN = function(x) length(unique(x))))

names(dat.agg) <- c("ID", "viable")
dat.agg
#  ID viable
#  1      6
#  2      8
#  3      4

hist(dat.agg$viable)
hist(dat.agg[dat.agg$viable > 10, "viable"]) # If you only 
                                            # care about those with 10 sessions

library(ggplot2) # More options with ggplot
ggplot(dat.agg, aes(viable)) + geom_histogram(binwidth = 1) 
Mark Panny
  • 92
  • 7
0

Here's what I ended up doing after getting help from a lab-mate:

dat_hist <- dat %>% group_by(ID) %>% summarize(ViableN=sum(!is.na(score))) # Get the count of viable runs

qplot(dat_hist$ViableN, geom="histogram", xlab = "Viable Runs", ylab = "Count", main = "Frequency of the Number of Viable Runs", binwidth=.5) table(dat_hist$ViableN) # Make the histogram

kdestasio
  • 69
  • 1
  • 9