Getting every combination in a data frame when data already exist in some rows (R)

Question

I recently just used dcast (in the reshape2 package) to condense my data frame from long format to wide (since I needed counts). Now, I need to fill in the combination that do NOT exist with a 0. I imagine I could do something with expand.grid in the base package but I'm not sure how (?) since I don't just want every combo but I also already have some counts. An example of what I have:

AgeGroup Sex Month Count 
10       F   2     4
10       F   6     1
11       M   6     2

And what I would like:

AgeGroup Sex Month Count
10       F   2     4
10       F   3     0
10       F   4     0 
10       F   5     0
10       F   6     1

Edit in response to Anada's comment:

Minimum reproducible data/code:

library(reshape2)

Sex <- c('M', 'F', 'F', 'F', 'M')
County <- c(41, 65, 35, 49, 41)
AgeGroup <- c(11, 10, 18, 11, 11)
Month <- c(1, 1, 2, 4, 1)
Count <- rep(1, 5)
DF <- cbind.data.frame(Sex, County, AgeGroup, Month, Count)

DF <-dcast(DF, County+Sex+Month+AgeGroup~Count, 
           value.var="Count", length)

names(DF)[names(DF)=='1'] <- 'Count'

Note that in this example two observations are identical on purpose to demonstrate how I want to collapse things. I also don't know why but dcast renames the Count column so I have to change the name at the end.

Use `expand.grid` to generate all combinations, and then use `merge` — Neal Fultz, Sep 08 '14 at 21:36
Perhaps a better option is to step back in your code. Using `dcast` on proper `factors` that have all the `levels` you want should allow you to keep everything you're interested in. Can you post some sample original data and the code you used to get to your first table above? — A5C1D2H2I1M1N2O1R2T1, Sep 09 '14 at 04:22

score 0 · Answer 1 · answered Sep 08 '14 at 21:47

Here is a solution with base R. If df is your dataframe, you can try:

    res<-expand.grid(lapply(df[,1:3],unique))
    res$Count<-0
    res$Count[match(do.call(paste,df[,1:3]),do.call(paste,res[,1:3]))]<-df$Count

You evaluate the unique values for each column and build a data.frame from them through expand.grid. Next, you set each Count to zero and update the ones present in the original data.frame matching the values.

score 0 · Answer 2 · answered Sep 08 '14 at 21:56

Here's a (fast) data.table solution (assuming your data called df)

library(data.table)
setkey(setDT(df), AgeGroup, Sex, Month) # Sorting/keying your data set for future merge
combs <- df[, CJ(unique(AgeGroup), unique(Sex), seq_len(max(Month)))] # CJ is data.tables equivalent expand.grid, but more efficient 
final <- df[combs] # Merging
final[is.na(Count), Count := 0] # Setting all missing combinations to zero
final
#     AgeGroup Sex Month Count
#  1:       10   F     1     0
#  2:       10   F     2     4
#  3:       10   F     3     0
#  4:       10   F     4     0
#  5:       10   F     5     0
#  6:       10   F     6     1
#  7:       10   M     1     0
#  8:       10   M     2     0
#  9:       10   M     3     0
# 10:       10   M     4     0
# 11:       10   M     5     0
# 12:       10   M     6     0
# 13:       11   F     1     0
# 14:       11   F     2     0
# 15:       11   F     3     0
# 16:       11   F     4     0
# 17:       11   F     5     0
# 18:       11   F     6     0
# 19:       11   M     1     0
# 20:       11   M     2     0
# 21:       11   M     3     0
# 22:       11   M     4     0
# 23:       11   M     5     0
# 24:       11   M     6     2

score 0 · Accepted Answer · answered Sep 08 '14 at 23:13

0

If I'm understanding what you are asking for correctly, I believe you can do this with another of Hadley Wickham's packages: plyr

using the ddply function and specifying .drop = F

try:

ddply(df, .(AgeGroup, Sex, Month),length, .drop = F)

answered Sep 08 '14 at 23:13

bstockton

555
2
6
20

This code doesn't quite work for me but I'll look into ddply, thank you – HFBrowning Sep 11 '14 at 15:41

Getting every combination in a data frame when data already exist in some rows (R)

3 Answers3