Calculate mean value of subsets and store them in a vector for further analysis

Question

Hullo, I've been working on a dataset for a while now, but am also kind of stuck. One question/answer here was already helpful, but I need to calculate the mean not for a single value, but sixty.

My dataset is basically this:

> data[c(1:5, 111:116), c(1:6, 85:87)]
    plotcode block plot subsample year month Alo.pra Ant.odo Arr.ela
91     B1A01    B1  A01         1 2003   May       0       9       0
92     B1A02    B1  A02         1 2003   May      38       0       0
93     B1A03    B1  A03         1 2003   May       0       0       0
94     B1A04    B1  A04         1 2003   May       0       0       0
95     B1A05    B1  A05         1 2003   May       0       0       0
214    B2A16    B2  A16         2 2003   May       0       0       0
215    B2A17    B2  A17         2 2003   May       0       0       0
216    B2A18    B2  A18         2 2003   May     486       0       0
217    B2A19    B2  A19         2 2003   May       0       0       0
218    B2A20    B2  A20         2 2003   May       0       0       0
219    B2A21    B2  A21         2 2003   May       0       0       0

The first few columns are general data about the data point. Each plot has had up to 4 subsamples. The columns 85:144 are the data I want to calculate the means of. I used this command:

tapply(data2003[,85] , as.factor(data2003$plotcode), mean, na.rm=T)

But like I said, I need to calculate the mean sixty times, for columns 85:144. My idea was using a for–loop.

for (i in 85:144)
{
    temp <- tapply(data2003[,i], data2003$plotcode, mean, na.rm=T)
    mean.mass.2003 <- rbind(mean.mass.2003, temp)
}

But that doesn't work. I get multiple error messages, "number of columns of result is not a multiple of vector length (arg 2)".

What I basically want is a table in which the columns represent the species, with the rows as the plotcode and the actual entries in the fields being the respective means.

Try `dplyr` `data %>% group_by(plotcode) %>% summarise_each(funs(mean=mean(., na.rm=TRUE)), starts_with('A'))` or you can specify `summarise_each(funs(mean=mean(., na.rm=TRUE)), 85:144)` — akrun, Feb 04 '15 at 12:03
You can also do it with base R as in `aggregate(. ~ plotcode, data2003[, c(1,85:144)], mean)` or With `data.table` package as in `setDT(data2003[, c(1,85:144)])[, lapply(.SD, mean), plotcode]` — David Arenburg, Feb 04 '15 at 12:05

score 0 · Answer 1 · answered Feb 04 '15 at 13:00

I figured and fiddled and had some help that worked as I wanted it. I know that's a kind of convoluted approach, but I only just started R, so I do like to understand what I code:

data.plots<-matrix(NA, 88,60) ## A new, empty matrix we'll fill with the loop

    for (i in 85:144) # The numbers because that's where our relevant data is
        {
            temp <- tapply(data2007[,i], data2007$plotcode, mean, na.rm=T) # What tapply does in this instance: It calculates the mean value of the i-th column form data2003 for every row in which the plotcode is the same, ignoring NAs. temp will be a single row of values, obviously.
            data.plots[,i-84]<-as.numeric(temp) # shunts the single row from temp we just calculated consecutively into data.plots.
        }

colnames(data.plots) <- colnames(data[85:144])
rownames(data.plots) <- as.data.frame(table(data$plotcode))[,1] # the second part is basically a count() function, returning in the first column the unique entries found and in the second the frequency of that entry.

This works. It shunts the mean biomass per species into a temporary vector(? data frame? matrix?) as its being calculated for every unique entry in data2003$plotcode, and then overwrites consecutively the rows of the target matrix data.plots.

After naming the rows and columns of data.plots I can work with it without always having to remember each name.

@DavidArenburg Mostly I don't understand them. I'm for all intents and purposes a total newbie in R. I know smooth code as an abstract concept, but can't _follow_ it. For instance, your proposed aggregate(), running your code gave me an error message, `Error in aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...) : No line/row for aggregation`. Having spent half an hour fiddling, the `aggregate(data2003[85:144], list(Plotcode = data2003$plotcode), mean)` worked. The results of aggregate and the for loop are equatable, though not the same; yours is more prim, though. — Temerity, Feb 05 '15 at 11:18
As for the others… after loading the `data.table` package setDT() was not available as a function, and `dplyr` isn't even on the list of available packages, oddly enough. @akrun's code I have no hope to understand, yet. — Temerity, Feb 05 '15 at 11:23
You need to reinstall these packages because you have old versions. You may have also install a newer version of R as it seems you a very very old one. — David Arenburg, Feb 05 '15 at 11:30

Calculate mean value of subsets and store them in a vector for further analysis

1 Answers1