I have an R data frame with data from multiple subjects, each tested several times. To perform statistics on the set, there is a factor for subject ("id") and a row for each observation (around 40,000) with around 200 variables each.
allData <- data.frame(id = rep(1:4, 3),
session = rep(1:3, each = 4),
measure1 = sample(c(NA, 1:11)),
measure2 = sample(c(NA, 1:11)),
measure3 = sample(c(NA, 1:11)),
measure4 = sample(c(NA, 1:11)))
allData
# id session measure1 measure2 measure3 measure4
# 1 1 1 3 7 10 6
# 2 2 1 4 4 9 9
# 3 3 1 6 6 7 10
# 4 4 1 1 5 2 3
# 5 1 2 NA NA 5 11
# 6 2 2 7 10 6 5
# 7 3 2 9 8 4 2
# 8 4 2 2 9 1 7
# 9 1 3 5 1 3 8
# 10 2 3 8 3 8 1
# 11 3 3 11 11 11 4
# 12 4 3 10 2 NA NA
I need to remove all rows with id 1 and 4, given that the "measureX" (X=1,..,4) column contains NA in one of the rows for the id 1 and 4.
A solution for this problem was suggested by flodel in [https://stackoverflow.com/a/9917524/5042101][1] using the "plyr" package and the function ddply.
probeColumns = c('measure1','measure4')
library(plyr)
ddply(allData, "id",
function(df)if(any(is.na(df[, probeColumns]))) NULL else df)
Problem. My database includes around 40,000 rows and 200 columns. An error appears when I try for a single column: C stack usage 10027284.
I am using R 3.1.3 in RStudio on Windows. When a try for more columns RStudio close up automatically or R freezes. Moreover, I do not have access to the administrator session in the computer.