1

I would want to dynamically subset a data frame and perform an analysis using one fixed variable and sequentially condition on the other variables. For example, let say I have a 3 variable data frame (in my case I have 10):

x  y  z
a  1  1
b  3  NA
NA 5  0
f  NA 1

I want to subset the data frame conditioning on 1) x, z are not missing
2) y, z are not missing

My targetted output is this:

x z
a 1
f 1

y z
1 1
5 0

I want this pairwise comparison to be done dynamically across all my dataset with n number of variables. The output can be a list.

user1916067
  • 127
  • 1
  • 11
  • Possible duplicate of http://stackoverflow.com/questions/37192961/applying-combn-function-to-data-frame – akrun Feb 22 '17 at 11:03

1 Answers1

2

We can use combn, then loop, subset and drop NA rows:

# dummy data
df1 <- read.table(text = "x  y  z
a  1  1
                  b  3  NA
                  NA 5  0
                  f  NA 1", header = TRUE)
# result
apply(combn(colnames(df1), 2), 2, function(i){
  res <- df1[, i]
  res[complete.cases(res), ]
  })
# [[1]]
#   x y
# 1 a 1
# 2 b 3
# 
# [[2]]
#   x z
# 1 a 1
# 4 f 1
# 
# [[3]]
#   y z
# 1 1 1
# 3 5 0
zx8754
  • 52,746
  • 12
  • 114
  • 209
  • 1
    or similarly, `lapply(combn(names(df), 2, FUN = list), function(x) na.omit(df[,x]))` – talat Feb 22 '17 at 10:57
  • @docendodiscimus nice trick with FUN. Strange that I got stuck with `complete.cases`, even though I know about `na.omit`. – zx8754 Feb 22 '17 at 11:01