How to avoid NA columns in dcast() output?

Question

How can I avoid NA columns in dcast() output from the reshape2 package?

In this dummy example the dcast() output will include an NA column:

require(reshape2)
data(iris)
iris[ , "Species2"] <- iris[ , "Species"]
iris[ 2:7, "Species2"] <- NA
(x <- dcast(iris, Species ~ Species2, value.var = "Sepal.Width", 
            fun.aggregate = length))
##     Species setosa versicolor virginica NA
##1     setosa     44          0         0  6
##2 versicolor      0         50         0  0
##3  virginica      0          0        50  0

For a somewhat similar usecase, table() does have an option that allows to avoid this:

table(iris[ , c(5,6)], useNA = "ifany")  ##same output as from dcast()
##            Species2
##Species      setosa versicolor virginica <NA>
##  setosa         44          0         0    6
##  versicolor      0         50         0    0
##  virginica       0          0        50    0
table(iris[ , c(5,6)], useNA = "no")  ##avoid NA columns
##            Species2
##Species      setosa versicolor virginica
##  setosa         44          0         0
##  versicolor      0         50         0
##  virginica       0          0        50

Does dcast() have a similar option that removes NA columns in the output? How can I avoid getting NA columns? (This function has a number of rather cryptic options that are sternly documented and that I cannot quite grasp...)

You could do `dcast(na.omit(iris), Species ~ Species2, value.var = "Sepal.Width")`, but this isn't very general solution if you are interested in some other columns too. — David Arenburg, Oct 04 '15 at 08:31
@DavidArenburg Indeed. I was aware of `na.omit(iris)`-like solutions, but I was looking for a different approach. I didn't include this requirement in the question to avoid making it too confusing... — landroni, Oct 04 '15 at 08:49
If I had to guess, I'd say it's intended behaviour so you need to consciously remove missing data (instead of doing that accidentally). I would solve it by selecting the data first, so `iris[!is.na(iris$Species2),]`. — Heroka, Oct 04 '15 at 08:55
@David if it's only NA's in a certain column that need to be removed. — Heroka, Oct 04 '15 at 11:20

pgoel6uc · Answer 1 · 2017-08-09T20:34:52.613

1

Here is how I was able to get around it:

iris[is.na(iris)] <- 'None'

x <- dcast(iris, Species ~ Species2, value.var="Sepal.Width", fun.aggregate = length)

x$None <- NULL

The idea is that you replace all the NAs with 'None', so that dcast creates a column called 'None' rather than 'NA'. Then, you can just delete that column in the next step if you don't need it.

edited Aug 09 '17 at 20:34

answered Feb 17 '17 at 21:56

pgoel6uc

89
1
3

Can you format your code as code, to make it easier to read? (Indent 4 spaces, or use the `{}` button.) Also, please add an explanation so that others can better understand your solution. – Robert Feb 17 '17 at 22:26

score 0 · Answer 2 · answered Oct 05 '15 at 11:06

One solution that I've found, which I'm not positively unhappy with, is based on the dropping NA values approach suggested in the comments. It leverages the subset argument in dcast() along with .() from plyr:

require(plyr)
(x <- dcast(iris, Species ~ Species2, value.var = "Sepal.Width",
            fun.aggregate = length, subset = .(!is.na(Species2))))
##     Species setosa versicolor virginica
##1     setosa     44          0         0
##2 versicolor      0         50         0
##3  virginica      0          0        50

For my particular purpose (within a custom function) the following works better:

(x <- dcast(iris, Species ~ Species2, value.var = "Sepal.Width", 
            fun.aggregate = length, subset = .(!is.na(get("Species2")))))
##     Species setosa versicolor virginica
##1     setosa     44          0         0
##2 versicolor      0         50         0
##3  virginica      0          0        50

score 0 · Answer 3 · answered Nov 30 '15 at 12:29

You could rename the NA column of the output and then make it NULL. (This works for me).

require(reshape2)
data(iris)
iris[ , "Species2"] <- iris[ , "Species"]
iris[ 2:7, "Species2"] <- NA

(x <- dcast(iris, Species ~ Species2, value.var = "Sepal.Width", 
            fun.aggregate = length)) 

setnames(x , c("setosa", "versicolor", "virginica", "newname"))

x$newname <- NULL

bramtayl · Answer 4 · 2015-10-07T14:19:00.357

-2

library(dplyr)
library(tidyr)
iris %>%
  filter(!is.na(Species2)) %>%
  group_by(Species, Species2) %>%
  summarize(freq = n()) %>%
  spread(Species2, freq)

edited Oct 07 '15 at 14:19

answered Oct 04 '15 at 08:44

bramtayl

4,004
2
11
18

All said and done, I would rather a `dcast()`-based solution, if possible. – landroni Oct 04 '15 at 08:50
Why exactly are people downvoting this answer? – Mike Wise Oct 05 '15 at 07:38
2

@MikeWise I don't know, but I suspect because it proposes an alternative instead of attempting to address the issue within the constraints of the question (i.e. the `dcast()` function). I do agree though that downvotes without *any* explanations is one of the most negative and counter-productive, community-wise, hehaviors around SE... – landroni Oct 07 '15 at 08:52
1

It seems like a valuable answer to me, though it would have been nice if he had added a sentence or two explaining it. – Mike Wise Oct 07 '15 at 09:39

How to avoid NA columns in dcast() output?

4 Answers4

Linked