I find myself often having to do multiple paired comparisons of subsets of a data set (with the subset provided for by one or two factors). Now, I would very much make this a bit easier to do in a comprehensive way, in a function.
This is what I have:
wilcox.pseudomedian <- function(x,conf.int=TRUE,na.rm=TRUE){
if(length(x) > 3){
ht <- wilcox.test(x,conf.int=conf.int,na.rm=na.rm)
return(ht$estimate[[1]])
}else{
return(NaN)
}
}
pairedwtest <- function(x,y){
ht <- wilcox.test(x,y,paired=TRUE)
out <- wilcox.reportAPA(ht)
return(out)
}
wilcox.reportAPA <- function(ht){
out <- paste(names(ht$statistic)[[1]],"=",ht$statistic,",p=",ht$p.value,sep="")
return(out)
}
I would then like to be able to apply these functions across a data frame in the manner that I supply. This is what I've got so far...
wilcox.masstest <- function(data,factorlist,speakervar,groupvar,measurevar){
melt(data,id.vars=c(factorlist,speakervar,groupvar),measure.vars=measurevar) -> mdf
form <- as.formula(paste(paste(c(factorlist,speakervar), collapse= "+"),"~",groupvar))
outdf <- dcast(mdf, form,fun.aggregate=wilcox.pseudomedian)
outdfn <- names(outdf)
mlvls <- setdiff(outdfn,factorlist)
for(curr in 2:(length(mlvls))){
fac1 <- mlvls[curr -1 ]
fac2 <- mlvls[curr]
facname <- paste(fac1,fac2,sep="-")
facnamerev <- paste(fac2,fac1,sep="-")
ddply(outdf,factorlist,summarize,results=pairedwtest(get(fac1),get(fac2))) -> out
}
return(out)
}
.. but the problematic bit is the ddply call at the end. The outdf dataframe will look something like this: (with the last three columns beeing what I would like to iterative test for differences within each subset of the data given by factor levels of all columns before the Patient column (in this case)).
Task Patient Control Med OFF Med ON
115 Spontaneous P45zi 0.12044504 0.06940783 0.12044504
116 Spontaneous P46zi 0.20694651 0.13495089 0.02022240
117 Spontaneous P47zi 0.13556909 0.10433863 0.10433863
118 Spontaneous P48zi 0.07519881 0.02795007 0.12044504
119 Spontaneous P49zi 0.02022240 0.01220851 0.12044504
Now, the call to ddply fails with a
"Error in get(fac1) : object 'fac1' not found"
warning. How do I supply the name of the factor to ddply in a way so that teh variable may be found when the call is made. I am sure I can do it by pasting the call together and then evaluating the text, but that seems just like a very bad idea...
Any ideas?