R: use a row as a grouping vector for row sums

Question

If I have a data set laid out like:

Cohort Food1 Food2 Food 3 Food 4
--------------------------------
Group   1     1     2       3
 A      1     1     0       1
 B      0     0     1       0
 C      1     1     0       1
 D      0     0     0       1

I want to sum each row, where I can define food groups into different categories. So I would like to use the Group row as the defining vector.

Which would mean that food1 and food2 are in group 1, food3 is in group 2, food 4 is in group 3.

Ideal output something like:

Cohort Group1 Group2 Group3
 A      2       0      1
 B      0       1      0
 C      2       0      1
 D      0       0      1

I tried using this rowsum() based functions but no luck, do I need to use ddply() instead?

Example data from comment:

dat <-
structure(list(species = c("group", "princeps", "bougainvillei", 
"hombroni", "lindsayi", "concretus", "galatea", "ellioti", "carolinae", 
"hydrocharis"), locust = c(1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 
0L), grasshopper = c(1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L), 
    snake = c(2L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L), fish = c(2L, 
    1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L), frog = c(2L, 0L, 0L, 
    0L, 0L, 0L, 0L, 1L, 0L, 0L), toad = c(2L, 0L, 0L, 0L, 0L, 
    1L, 0L, 0L, 0L, 0L), fruit = c(3L, 0L, 0L, 0L, 0L, 1L, 1L, 
    0L, 0L, 0L), seed = c(3L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 
    0L)), .Names = c("species", "locust", "grasshopper", "snake", 
"fish", "frog", "toad", "fruit", "seed"), class = "data.frame", row.names = c(NA, 
-10L))

can you use `dput(yourdata)` so it is easier for us to reproduce your issue? — Justin, Oct 08 '12 at 18:42
my data set is huge, I just created this as a simplified example, do you want a subset of my data? — Nick Crouch, Oct 08 '12 at 18:48
Yes, right now to test your data I would have to retype it all! — Justin, Oct 08 '12 at 18:49
@NickCrouch, don't be so quick to accept! I just realized there's actually a mistake in my code, and I'm checking to see if there's an easy answer. — A5C1D2H2I1M1N2O1R2T1, Oct 08 '12 at 19:58
@NickCrouch, please see my updated answer. There was a mistake in the original answer I provided. — A5C1D2H2I1M1N2O1R2T1, Oct 08 '12 at 20:14

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2012-10-08T20:36:14.923

There are most likely more direct approaches, but here is one you can try:

First, create a copy of your data minus the second header row.
```
dat2 <- dat[-1, ]
```

melt() and dcast() and so on from the "reshape2" package don't work nicely with duplicated column names, so let's make the column names more "reshape2 appropriate".

Seq <- ave(as.vector(unlist(dat[1, -1])), 
           as.vector(unlist(dat[1, -1])), 
           FUN = seq_along)
names(dat2)[-1] <- paste("group", dat[1, 2:ncol(dat)], 
                         ".", Seq, sep = "")

melt() the dataset

m.dat2 <- melt(dat2, id.vars="species")

Use the colsplit() function to split the columns correctly.

m.dat2 <- cbind(m.dat2[-2], 
                colsplit(m.dat2$variable, "\\.", 
                         c("group", "time")))
head(m.dat2)
#         species value  group time
# 1      princeps     0 group1    1
# 2 bougainvillei     0 group1    1
# 3      hombroni     1 group1    1
# 4      lindsayi     0 group1    1
# 5     concretus     0 group1    1
# 6       galatea     0 group1    1

Proceed with dcast() as usual

dcast(m.dat2, species ~ group, sum)
#         species group1 group2 group3
# 1 bougainvillei      0      0      0
# 2     carolinae      1      1      0
# 3     concretus      0      2      2
# 4       ellioti      0      1      0
# 5       galatea      1      1      1
# 6      hombroni      2      1      0
# 7   hydrocharis      0      0      0
# 8      lindsayi      0      1      0
# 9      princeps      0      1      0

Note: Edited because original answer was incorrect.

Update: An easier way in base R

This problem is much more easily solved if you start by transposing your data.

dat3 <- t(dat[-1, -1])
dat3 <- as.data.frame(dat3)
names(dat3) <- dat[[1]][-1]
t(do.call(rbind, lapply(split(dat3, as.numeric(dat[1, -1])), colSums)))
#               1 2 3
# princeps      0 1 0
# bougainvillei 0 0 0
# hombroni      2 1 0
# lindsayi      0 1 0
# concretus     0 2 2
# galatea       1 1 1
# ellioti       0 1 0
# carolinae     1 1 0
# hydrocharis   0 0 0

Melting, aggregating, and re-casting is more straightforward than the approach I was trying to work up. — Brian Diggs, Oct 08 '12 at 19:12
Does this actually give the answer requested? As far as I can tell, these sums are not correct... — TARehman, Oct 08 '12 at 20:07
@TARehman, the error was already noted (see comment under the main question) and now corrected, if you care to correct the downvote ;) — A5C1D2H2I1M1N2O1R2T1, Oct 08 '12 at 20:13
@BrianDiggs, unfortunately, my original answer was wrong because of what happens when melting with duplicated column names. I've updated it now, though, and also provided a more direct base R solution. Congrats on your 10k, by the way! — A5C1D2H2I1M1N2O1R2T1, Oct 09 '12 at 04:56

score 1 · Answer 2 · answered Oct 08 '12 at 20:12

You can do this using base R fairly easily. Here's an example.

First, figure out which animals belong in which group:

groupings <- as.data.frame(table(as.numeric(dat[1,2:9]),names(dat)[2:9]))

attach(groupings)
grp1 <- groupings[Freq==1 & Var1==1,2]
grp2 <- groupings[Freq==1 & Var1==2,2]
grp3 <- groupings[Freq==1 & Var1==3,2]
detach(groupings)

Then, use the groups to do a rowSums() on the correct columns.

dat <- cbind(dat,rowSums(dat[as.character(grp1)]))
dat <- cbind(dat,rowSums(dat[as.character(grp2)]))
dat <- cbind(dat,rowSums(dat[as.character(grp3)]))

Delete the initial row and the intermediate columns:

dat <- dat[-1,-c(2:9)]

Then, just rename things correctly:

row.names(dat) <- rm()
names(dat) <- c("species","group_1","group_2","group_3")

And you ultimately get:

      species group_1 group_2 group_3
bougainvillei       0       0       0
    carolinae       1       1       0
    concretus       0       2       2
      ellioti       0       1       0
      galatea       1       1       1
     hombroni       2       1       0
  hydrocharis       0       0       0
     lindsayi       0       1       0
     princeps       0       1       0

EDITED: Changed sort order to alphabetical, like other answer.

Nice thinking. I've also updated my answer with another base R solution. — A5C1D2H2I1M1N2O1R2T1, Oct 08 '12 at 20:36

R: use a row as a grouping vector for row sums

2 Answers2

Update: An easier way in base R