0

I find myself often having to do multiple paired comparisons of subsets of a data set (with the subset provided for by one or two factors). Now, I would very much make this a bit easier to do in a comprehensive way, in a function.

This is what I have:

wilcox.pseudomedian <- function(x,conf.int=TRUE,na.rm=TRUE){

  if(length(x) > 3){
    ht <- wilcox.test(x,conf.int=conf.int,na.rm=na.rm)
    return(ht$estimate[[1]])
  }else{
    return(NaN)
  }
}

pairedwtest <- function(x,y){
  ht <- wilcox.test(x,y,paired=TRUE)
  out <- wilcox.reportAPA(ht)
  return(out)
}

wilcox.reportAPA <- function(ht){

  out <- paste(names(ht$statistic)[[1]],"=",ht$statistic,",p=",ht$p.value,sep="")
  return(out)
}

I would then like to be able to apply these functions across a data frame in the manner that I supply. This is what I've got so far...

wilcox.masstest <- function(data,factorlist,speakervar,groupvar,measurevar){
  melt(data,id.vars=c(factorlist,speakervar,groupvar),measure.vars=measurevar) -> mdf
  form <- as.formula(paste(paste(c(factorlist,speakervar), collapse= "+"),"~",groupvar))

  outdf <- dcast(mdf, form,fun.aggregate=wilcox.pseudomedian)
  outdfn <- names(outdf)
  mlvls <- setdiff(outdfn,factorlist)

  for(curr in 2:(length(mlvls))){
      fac1 <- mlvls[curr -1 ]
      fac2 <- mlvls[curr]
      facname <- paste(fac1,fac2,sep="-")
      facnamerev <- paste(fac2,fac1,sep="-")

      ddply(outdf,factorlist,summarize,results=pairedwtest(get(fac1),get(fac2))) -> out
  }
  return(out)
}

.. but the problematic bit is the ddply call at the end. The outdf dataframe will look something like this: (with the last three columns beeing what I would like to iterative test for differences within each subset of the data given by factor levels of all columns before the Patient column (in this case)).

           Task  Patient    Control    Med OFF     Med ON
115 Spontaneous    P45zi 0.12044504 0.06940783 0.12044504
116 Spontaneous    P46zi 0.20694651 0.13495089 0.02022240
117 Spontaneous    P47zi 0.13556909 0.10433863 0.10433863
118 Spontaneous    P48zi 0.07519881 0.02795007 0.12044504
119 Spontaneous    P49zi 0.02022240 0.01220851 0.12044504

Now, the call to ddply fails with a

"Error in get(fac1) : object 'fac1' not found"

warning. How do I supply the name of the factor to ddply in a way so that teh variable may be found when the call is made. I am sure I can do it by pasting the call together and then evaluating the text, but that seems just like a very bad idea...

Any ideas?

Fredrik Karlsson
  • 485
  • 8
  • 21

1 Answers1

0

It would be easier to help you if you would also give an example dataset. However the problem may be as simple as this:

> mlvls <- runif(20, 1,10)
> mlvls[2] 
[1] 6.617676
> mlvls[3] 
[1] 6.788338
> fac1 <- mlvls[2] 
> fac2 <- mlvls[3]
> get(fac1)          # will not work
Fehler in get(fac1) : ungültiges erstes Argument
> get("fac2")        # will work
[1] 6.788338
albifrons
  • 303
  • 2
  • 9