0

I need to know how many levels of a certain factor have each of the unique list of levels of other factors in the data frame. In other words, for this data, how many sites have crop1 vs. how many sites have crop2, and then how many have crop1 on soil a ad. infinitum. I want numbers for each level of interaction (no interaction/only crops, crops*soil) This is pretty easy with aggregate if I just want to answer one question at a time. But, with nine factors this gets pretty tedious to find the numbers then merge those back to the dataframe as I've done below.

df <- data.frame(crop = c(1,1,2,2,2,2,2,2,2,1),
                 site=c(LETTERS[1:7],"A","A","A"),
                 soil=c('a','a','b','b','b','c','c','c','c','c'))

Add numbers of sites with the same crop, same soil, same crop x soil

#... 1st for crop
f<-(unique(df[c("site","crop")]))
f<-(aggregate(numeric(nrow(f)), f[c("crop")], length))
names(f)[names(f)=="x"]<-"sites_w_same_cr"
df<-merge(df,f,by="crop")
#..2nd for soil
f<-(unique(df[c("site","soil")]))
f<-(aggregate(numeric(nrow(f)), f[c("soil")], length))
names(f)[names(f)=="x"]<-"sites_w_same_sl"
df<-merge(df,f,by="soil")
#..3rd for soil*crop
f<-(unique(df[c("site","crop","soil")]))
f<-(aggregate(numeric(nrow(f)), f[c("crop","soil")], length))
names(f)[names(f)=="x"]<-"sites_w_same_cr.sl"
df<-merge(df,f,by=c("crop","soil"))

How do I keep doing this for more factors, in other words put the answer to "How many sites grow crop 1, on soil a, are irrigated, receive fertilizer, ...?" in new columns on the same dateframe? The column names I gave for the merged columns are for clarity, they could simply be some combination of the factors like "crop", "crop.soil" etc that could be generated from the factors themselves. This answer shows how to get the levels of all factors at once, but not how to get the length of each one or each interaction. Thanks!

CrunchyTopping
  • 803
  • 7
  • 17
  • 1
    probably can just do `cn <- c('crop', 'soil'); x <- do.call(paste, df[cn]); ave(seq_along(x), x, FUN = length)` where you change `cn` for each single or multiple column names – rawr Oct 17 '17 at 16:48
  • This works great if there is only one observation per "site." How do I use this if some "sites" occur more than once? (see edits in the sample data). Thanks – CrunchyTopping Oct 17 '17 at 18:02
  • using your method, I get the same answer using either one, what is the correct answer when there are multiple observations per site – rawr Oct 17 '17 at 18:31
  • Ok edited again. Now no treatment has 4 sites in my method (what I need), but one does in yours. I swear I'm not just trying to poke holes in your comment :), I just messed up the dataframe names (some df's should've been f's) and now fixed them. – CrunchyTopping Oct 17 '17 at 20:58
  • `as.integer(ave(as.character(df$site), x, FUN = function(x) length(unique(x))))` instead? – rawr Oct 17 '17 at 22:13
  • That works. I have `combo_cs<-do.call(paste,df[c("crop","soil")]) df$s.cr.sl<-as.integer(ave(as.character(df$site), combo_cs, FUN=function(combo_cs) length(unique(combo_cs))))` now for each combination of factors. – CrunchyTopping Oct 18 '17 at 13:27

0 Answers0