3

I am writing a piece of R code which is looping through a dataframe and then running time series predictions from the subsetted dataframe. However, the manner in which I created the loop gives me a number of columns with 0 values. There could a single column with non zero values or many columns with non zero values, but there will always be a minimum of one column with non zero values. Each iteration through the loop could yield a different number of non-zero columns.

Please see the following discussions regarding this topic.

Remove columns with zero values from a dataframe

Delete all columns with 0 from matrix

How do I get the following code to work? I will provide 2 examples that captures the crux of my issue. The first example works great and is exactly what I need to adapt to work.

dat <- data.frame(x = rep(0, 10), y = rnorm(10), z = rep(0, 10), a = rnorm(10)) 
dat <- dat[, colSums(dat) > 0]

The second example fails because there is only a single column of non zero values.

dat2 <- data.frame(x = rep(0, 10), y = rep(0, 10), z = rep(0, 10), a = rnorm(10))
dat2 <- dat2[, colSums(dat2) > 0]

Any insights would be greatly appreciated. Thanks for the help.

Community
  • 1
  • 1
DKane
  • 55
  • 1
  • 8

1 Answers1

2

Try with either drop=FALSE as the default is drop=TRUE or you remove the , and it will return a data.frame. For more info, please check ?"["

dat2[colSums(dat2) > 0]

Or

dat2[,colSums(dat2) > 0, drop=FALSE]

If you use subset, the default is drop=FALSE

subset(dat2, select=colSums(dat2) > 0)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    (+1) Explanation: Data frames are just special lists, and `[j]` is "list syntax", which always returns a list. `[i, j, drop=FALSE]` is syntactic sugar that is equivalent to `[[j]][i]`. For the difference between `[[` and `[` on lists, see `help('[')` and `help('[.data.frame')`. Setting `drop=TRUE` prevents R from "dropping" down from a data frame to a single vector. – shadowtalker Jun 13 '15 at 15:06
  • 1
    @akrun - That looks like it did the trick. I tested it on a smaller subset just now and will run through the whole procedure now with the full dataset. Ill let you know if I run into any more issues. Thanks again for the quick response. – DKane Jun 13 '15 at 15:10
  • @ssdecontrol Your explanation is really helpful for me to understand the reason behind the issue. Thank you for taking the time to explain that to me. I thought that the property of the df was fundamentally different than a list and couldn't see why the code wasn't working. – DKane Jun 13 '15 at 15:12