I need to know how many levels of a certain factor have each of the unique list of levels of other factors in the data frame. In other words, for this data, how many sites have crop1 vs. how many sites have crop2, and then how many have crop1 on soil a ad. infinitum. I want numbers for each level of interaction (no interaction/only crops, crops*soil) This is pretty easy with aggregate
if I just want to answer one question at a time. But, with nine factors this gets pretty tedious to find the numbers then merge those back to the dataframe as I've done below.
df <- data.frame(crop = c(1,1,2,2,2,2,2,2,2,1),
site=c(LETTERS[1:7],"A","A","A"),
soil=c('a','a','b','b','b','c','c','c','c','c'))
Add numbers of sites with the same crop, same soil, same crop x soil
#... 1st for crop
f<-(unique(df[c("site","crop")]))
f<-(aggregate(numeric(nrow(f)), f[c("crop")], length))
names(f)[names(f)=="x"]<-"sites_w_same_cr"
df<-merge(df,f,by="crop")
#..2nd for soil
f<-(unique(df[c("site","soil")]))
f<-(aggregate(numeric(nrow(f)), f[c("soil")], length))
names(f)[names(f)=="x"]<-"sites_w_same_sl"
df<-merge(df,f,by="soil")
#..3rd for soil*crop
f<-(unique(df[c("site","crop","soil")]))
f<-(aggregate(numeric(nrow(f)), f[c("crop","soil")], length))
names(f)[names(f)=="x"]<-"sites_w_same_cr.sl"
df<-merge(df,f,by=c("crop","soil"))
How do I keep doing this for more factors, in other words put the answer to "How many sites grow crop 1, on soil a, are irrigated, receive fertilizer, ...?" in new columns on the same dateframe? The column names I gave for the merged columns are for clarity, they could simply be some combination of the factors like "crop", "crop.soil" etc that could be generated from the factors themselves. This answer shows how to get the levels of all factors at once, but not how to get the length of each one or each interaction. Thanks!