-2

In the twoway package, I have a twoway.default() method that takes a matrix or data frame and applies Tukey's methods for the analysis of twoway tables.

Example:

> data(taskRT)
> taskRT
       topic1 topic2 topic3 topic4
Easy     2.43   3.12   3.68   4.04
Medium   3.41   3.91   4.07   5.10
Hard     4.21   4.65   5.87   5.69
> twoway(taskRT)

Mean decomposition (Dataset: "taskRT")
Residuals bordered by row effects, column effects, and overall

         topic1    topic2    topic3    topic4      roweff   
       + --------- --------- --------- --------- + ---------
Easy   | -0.055833  0.090833  0.004167 -0.039167 : -0.864167
Medium |  0.119167  0.075833 -0.410833  0.215833 : -0.059167
Hard   | -0.063333 -0.166667  0.406667 -0.176667 :  0.923333
       + ......... ......... ......... ......... + .........
coleff | -0.831667 -0.288333  0.358333  0.761667 :  4.181667

I want to extend this with a formula method that takes a data frame and a formula of the form response ~ row + column, reshapes this from long to wide and then calls the default method. I know several ways to do this directly in the console, but I can't seem to get any of them to work in a formula method function.

Thus, for this data in long format, with the cell value called RT and the row and column variables as task and topic, I'd like to get the same results with a call of

twoway(RT ~ task + topic, data=long)

At top-level, in the console I can do this in various ways, starting from a long version of the same data.

library(reshape2)
long <- melt(as.matrix(taskRT))
colnames(long) <- c("task", "topic", "RT")

Convert back to wide format, and call twoway() on that:

# convert wide to long: dcast
(wide <- dcast(long, task ~ topic, value.var="RT"))
twoway(wide[,-1])

# tidyr::spread
library(tidyr)
(wide <- spread(long, key=topic, value=RT))
twoway(wide[,-1])

# base, unstack
wide <- unstack(long, form = RT ~ topic)
rownames(wide) <- unique(long$task)
twoway(wide)

Below is an initial sketch of a twoway.formula method. The problem I'm having is that I can't figure out how to use the results of parsing the formula object and the associated data frame in the function to construct a call in the function that would result in a wide matrix or data frame suitable for passing to the default method. So far, I've been trying various forms of dcast within the function, shown as comments, none of which give me joy.

#' Initial sketch for a twoway formula method
#'
#' Doesn't do anything useful yet, but the idea is to be able to use a
#' formula for a twoway table in long form, e.g.,
#' twoway(response ~ row + col, data=mydata)
#'
#' @param formula A formula of the form \code{response ~ rowvar + colVAR}
#' @param data The name of the data set
#' @param subset An expression to subset the data (unused)
#' @param na.action What to do with NAs? (unused)
#' @param ... other arguments, passed down
#' @importFrom stats terms
#'
twoway.formula <- function(formula, data, subset, na.action, ...) {

  if (missing(formula) || !inherits(formula, "formula"))
    stop("'formula' missing or incorrect")
  if (length(formula) != 3L)
    stop("'formula' must have both left and right hand sides")
  tt <- if (is.data.frame(data))
    terms(formula, data = data)
  else terms(formula)
  if (any(attr(tt, "order") > 1))
    stop("interactions are not allowed")

  rvar <- attr(terms(formula[-2L]), "term.labels")
  lvar <- attr(terms(formula[-3L]), "term.labels")
  rhs.has.dot <- any(rvar == ".")
  lhs.has.dot <- any(lvar == ".")
  if (lhs.has.dot || rhs.has.dot)
    stop("'formula' has '.' in left or right hand sides")
  m <- match.call(expand.dots = FALSE)
  edata <- eval(m$data, parent.frame())
  lhs <- formula[[2]]
  rhs <- formula[[3]]

  #  wide <- dcast(data=edata, formula=as.formula(rhs), value.var=lhs )
  #  wide <- dcast(data=edata, value.var=lhs)
  #  wide <- dcast(data=edata, rvar[1] ~ rvar[2], value.var=cvar)
  #  wide <- dcast(data=edata, list(.(rvar[1], .(rvar[2], .(cvar)))))
#browser()
  stop("The formula method is not yet implemented.")

  # call the default method on the wide data set
  twoway(wide)
}

Can anyone help?

user101089
  • 3,756
  • 1
  • 26
  • 53
  • It's alittle confusing on what you are looking for. Are you trying to write a function that will convert any dataset from long to wide format? – nak5120 May 22 '18 at 13:49
  • No, you don't understand the wider context of my question, which was clearly stated: how to do this within the context of an S3 formula method. – user101089 May 22 '18 at 18:03

1 Answers1

1

Using tidyverse...

library(tibble)
library(tidyr)
library(dplyr)

twoway.formula <- function(formula, data, subset, na.action, ...) {

  if (missing(formula) || !inherits(formula, "formula"))
    stop("'formula' missing or incorrect")
  if (length(formula) != 3L)
    stop("'formula' must have both left and right hand sides")
  tt <- if (is.data.frame(data)) {
    terms(formula, data = data)
  } else { terms(formula) }
  if (any(attr(tt, "order") > 1))
    stop("interactions are not allowed")

  rvar <- attr(terms(formula[-2L]), "term.labels")
  lvar <- attr(terms(formula[-3L]), "term.labels")
  rhs.has.dot <- any(rvar == ".")
  lhs.has.dot <- any(lvar == ".")
  if (lhs.has.dot || rhs.has.dot)
    stop("'formula' has '.' in left or right hand sides")
  m <- match.call(expand.dots = FALSE)
  edata <- eval(m$data, parent.frame())
  lhs <- formula[[2]]
  rhs <- formula[[3]]

  wide <- 
    edata %>% 
    select(one_of(rvar, lvar)) %>% 
    spread(key = rvar[2], value = lvar) %>% 
    column_to_rownames(rvar[1])

  # call the default method on the wide data set
  twoway(wide)
}


library(twoway)
data(taskRT)

library(reshape2)
long <- melt(as.matrix(taskRT))
colnames(long) <- c("task", "topic", "RT")

twoway(taskRT)

twoway(RT ~ task + topic, data = long)
CJ Yetman
  • 8,373
  • 2
  • 24
  • 56
  • Lovely; thanks. There is one error: should be `wide <- edata %>% ...` – user101089 May 23 '18 at 12:31
  • Q: I don't understand the use of `one_of()` here. – user101089 May 23 '18 at 12:38
  • `one_of()` selects variables whose name is equal to any of the strings passed to it (technically, I suppose it converts the strings to symbols and passes them to `select`, or something like that) – CJ Yetman May 23 '18 at 14:19
  • changed `data` to `edata`... not really sure what the purpose of that is... it worked either way for me – CJ Yetman May 23 '18 at 14:20
  • and if you were wondering why I was selecting specific variables, it was just a defensive approach assuming you might pass data that had more than the three variables used in the formula – CJ Yetman May 23 '18 at 14:22