Can GGPLOT make 2D summaries of data?

Question

I wish to plot mean (or other function) of reaction time as a function of the location of the target in the x y plane. As test data:

library(ggplot2)
xs <- runif(100,-1,1)
ys <- runif(100,-1,1)
rts <- rnorm(100)
testDF <- data.frame("x"=xs,"y"=ys,"rt"=rts)

I know I can do this:

p <- ggplot(data = testDF,aes(x=x,y=y))+geom_bin2d(bins=10)

What I would like to be able to do, is the same thing but plot a function of the data in each bin rather than counts. Can I do this?

Or do I need to generate the conditional means first in R (e.g. drt <- tapply(testDF$rt,list(cut(testDF$x,10),cut(testDF$y,10)),mean)) and then plot that?

Thank you.

score 12 · Answer 1 · edited Mar 02 '12 at 20:09

Update With the release of ggplot2 0.9.0, much of this functionality is covered by the new additions of stat_summary2d and stat_summary_bin.

here is a gist for this answer: https://gist.github.com/1341218

here is a slight modification of stat_bin2d so as to accept arbitrary function:

StatAggr2d <- proto(Stat, {
  objname <- "aggr2d" 
  default_aes <- function(.) aes(fill = ..value..)
  required_aes <- c("x", "y", "z")
  default_geom <- function(.) GeomRect

  calculate <- function(., data, scales, binwidth = NULL, bins = 30, breaks = NULL, origin = NULL, drop = TRUE, fun = mean, ...) {

    range <- list(
      x = scales$x$output_set(),
      y = scales$y$output_set()
    )

    # Determine binwidth, if omitted
    if (is.null(binwidth)) {
      binwidth <- c(NA, NA)
      if (is.integer(data$x)) {
        binwidth[1] <- 1
      } else {
        binwidth[1] <- diff(range$x) / bins
      }
      if (is.integer(data$y)) {
        binwidth[2] <- 1
      } else {
        binwidth[2] <- diff(range$y) / bins
      }      
    }
    stopifnot(is.numeric(binwidth))
    stopifnot(length(binwidth) == 2)

    # Determine breaks, if omitted
    if (is.null(breaks)) {
      if (is.null(origin)) {
        breaks <- list(
          fullseq(range$x, binwidth[1]),
          fullseq(range$y, binwidth[2])
        )
      } else {
        breaks <- list(
          seq(origin[1], max(range$x) + binwidth[1], binwidth[1]),
          seq(origin[2], max(range$y) + binwidth[2], binwidth[2])
        )
      }
    }
    stopifnot(is.list(breaks))
    stopifnot(length(breaks) == 2)
    stopifnot(all(sapply(breaks, is.numeric)))
    names(breaks) <- c("x", "y")

    xbin <- cut(data$x, sort(breaks$x), include.lowest=TRUE)
    ybin <- cut(data$y, sort(breaks$y), include.lowest=TRUE)

    if (is.null(data$weight)) data$weight <- 1
    ans <- ddply(data.frame(data, xbin, ybin), .(xbin, ybin), function(d) data.frame(value = fun(d$z)))

    within(ans,{
      xint <- as.numeric(xbin)
      xmin <- breaks$x[xint]
      xmax <- breaks$x[xint + 1]

      yint <- as.numeric(ybin)
      ymin <- breaks$y[yint]
      ymax <- breaks$y[yint + 1]
    })
  }
})

stat_aggr2d <- StatAggr2d$build_accessor()

and usage:

ggplot(data = testDF,aes(x=x,y=y, z=rts))+stat_aggr2d(bins=3)
ggplot(data = testDF,aes(x=x,y=y, z=rts))+
  stat_aggr2d(bins=3, fun = function(x) sum(x^2))

enter image description here

As well, here is a slight modification of stat_binhex:

StatAggrhex <- proto(Stat, {
  objname <- "aggrhex"

  default_aes <- function(.) aes(fill = ..value..)
  required_aes <- c("x", "y", "z")
  default_geom <- function(.) GeomHex

  calculate <- function(., data, scales, binwidth = NULL, bins = 30, na.rm = FALSE, fun = mean, ...) {
    try_require("hexbin")
    data <- remove_missing(data, na.rm, c("x", "y"), name="stat_hexbin")

    if (is.null(binwidth)) {
      binwidth <- c( 
        diff(scales$x$input_set()) / bins,
        diff(scales$y$input_set() ) / bins
      )
    }

    try_require("hexbin")

    x <- data$x
    y <- data$y

    # Convert binwidths into bounds + nbins
    xbnds <- c(
      round_any(min(x), binwidth[1], floor) - 1e-6, 
      round_any(max(x), binwidth[1], ceiling) + 1e-6
    )
    xbins <- diff(xbnds) / binwidth[1]

    ybnds <- c(
      round_any(min(y), binwidth[1], floor) - 1e-6, 
      round_any(max(y), binwidth[2], ceiling) + 1e-6
    )
    ybins <- diff(ybnds) / binwidth[2]

    # Call hexbin
    hb <- hexbin(
      x, xbnds = xbnds, xbins = xbins,  
      y, ybnds = ybnds, shape = ybins / xbins,
      IDs = TRUE
    )
    value <- tapply(data$z, hb@cID, fun)

    # Convert to data frame
    data.frame(hcell2xy(hb), value)
  }


})

stat_aggrhex <- StatAggrhex$build_accessor()

and usage:

ggplot(data = testDF,aes(x=x,y=y, z=rts))+stat_aggrhex(bins=3)
ggplot(data = testDF,aes(x=x,y=y, z=rts))+
  stat_aggrhex(bins=3, fun = function(x) sum(x^2))

enter image description here

+1 Thank you for posting this. I shall study this carefully because I tried to make this modification but was unsuccessful. — Andrie, Nov 05 '11 at 14:15
@kohske: Just a note. Your formula and example seems to be not adjusted for those without your level of expertise. — Paulo E. Cardoso, Mar 26 '13 at 23:09
Error in proto(Stat, { : object 'Stat' not found ------ Where does 'Stat' come from? — ctlamb, May 13 '16 at 21:06
these codes cannot working now, would you mind modifying it? Thank you — Ling Zhang, Dec 01 '16 at 08:34
@kohske @joran , I know the question was answered long time ago and since `proto` replaced by `ggproto`, there are too many mistakes to correct. As knowledge I get now, I cannot rewrite these codes to make it runable, so would you mind modifying these codes to make it runable? Thank you — Ling Zhang, Dec 02 '16 at 01:59

score 1 · Answer 2 · answered Jul 26 '11 at 15:42

This turned out to be harder than I expected.

You can almost trick ggplot into doing this, by providing a weights aesthetic, but that only gives you the sum of the weights in the bin, not the mean (and you have to specify drop=FALSE to retain negative bin values). You can also retrieve either counts or density within a bin, but neither of those really solves the problem.

Here's what I ended up with:

## breaks vector (slightly coarser than the 10x10 spec above;
##   even 64 bins is a lot for binning only 100 points)
bvec <- seq(-1,1,by=0.25)  

## helper function
tmpf <- function(x,y,z,FUN=mean,breaks) {
  midfun <- function(x) (head(x,-1)+tail(x,-1))/2
  mids <- list(x=midfun(breaks$x),y=midfun(breaks$y))
  tt <- tapply(z,list(cut(x,breaks$x),cut(y,breaks$y)),FUN)
  mt <- melt(tt)
  ## factor order gets scrambled (argh), reset it
  mt$X1  <- factor(mt$X1,levels=rownames(tt))
  mt$X2  <- factor(mt$X2,levels=colnames(tt))  
  transform(X,
            x=mids$x[mt$X1],
            y=mids$y[mt$X2])
}

ggplot(data=with(testDF,tmpf(x,y,rt,breaks=list(x=bvec,y=bvec))),
       aes(x=x,y=y,fill=value))+
  geom_tile()+
  scale_x_continuous(expand=c(0,0))+   ## expand to fill plot region
  scale_y_continuous(expand=c(0,0))

This assumes equal bin widths, etc., could be extended ... it really is too bad that (as far as I can tell) stat_bin2d doesn't accept a user-specified function.

I get "object 'X' not found", and when I change the X to x in `transform()`, I get "Error in eval(expr, envir, enclos) : object 'mids' not found". — Ari B. Friedman, Jul 26 '11 at 18:02

Can GGPLOT make 2D summaries of data?

2 Answers2

Linked