10

I found a nice example of plotting convex hull shapes using ggplot with ddply here: Drawing outlines around multiple geom_point groups with ggplot

I thought I'd try something similar--create something like an Ashby Diagram--to practice with the data.table package:

test<-function()
{
library(data.table)
library(ggplot2)

set.seed(1)

Here I define a simple table:

dt<-data.table(xdata=runif(15),ydata=runif(15),level=rep(c("a","b","c"),each=5),key="level")

And then I define the hull positions by level:

hulls<-dt[,as.integer(chull(.SD)),by=level]
setnames(hulls,"V1","hcol")

So then my thought was to merge hulls with dt, so that I could eventually manipulate hulls to get in the proper form for ggplot (shown below for reference):

ashby<-ggplot(dt,aes(x=xdata,y=ydata,color=level))+
        geom_point()+
        geom_line()+
        geom_polygon(data=hulls,aes(fill=level))
}

But it seems that any way I try to merge hulls and dt, I get an error. For example, merge(hulls,dt) produces the error as shown in footnote 1.

This seems like it should be simple, and I'm sure I'm just missing something obvious. Any direction to a similar post or thoughts on how to prep hull for ggplot is greatly appreciated. Or if you think that it's best to stick with the ddply approach, please let me know.

Example undesired output:

test<-function(){
    library(data.table)
    library(ggplot2)
    dt<-data.table(xdata=runif(15),ydata=runif(15),level=rep(c("a","b","c"),each=5),key="level")
    set.seed(1)
    hulls<-dt[,as.integer(chull(.SD)),by=level]
    setnames(hulls,"V1","hcol")
    setkey(dt, 'level') #setting the key seems unneeded
    setkey(hulls, 'level')
    hulls<-hulls[dt, allow.cartesian = TRUE]
    ggplot(dt,aes(x=xdata,y=ydata,color=level))+
            geom_point()+
            geom_polygon(data=hulls,aes(fill=level))
}

results in a mess of criss-crossing polygons: undesired output

Footnote 1:

Error in vecseq(f__, len__, if (allow.cartesian) NULL else as.integer(max(nrow(x), : Join results in 60 rows; more than 15 = max(nrow(x),nrow(i)). Check for duplicate key values in i, each of which join to the same group in x over and over again. If that's ok, try including j and dropping by (by-without-by) so that j runs for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and datatable-help for advice.

Jaap
  • 81,064
  • 34
  • 182
  • 193
Docuemada
  • 1,703
  • 2
  • 25
  • 44
  • And I'm guessing there's an elegant way of doing this. With this capability, I think the method could be easily extended to make Ashby-like plots. For example: http://commons.wikimedia.org/wiki/File:Ashby_plot_big.jpg – Docuemada May 08 '13 at 00:59
  • 2
    +1 for showing your efforts and clearly explaining what you want. Please note that calling `library` within your own function is probably unnecessary (and inefficient if you plan to call the function many times). – Victor K. May 08 '13 at 03:09

1 Answers1

10

Here is what you want to do. Generating some random data:

library(ggplot2)
library(data.table)
# You have to set the seed _before_ you generate random data, not after
set.seed(1) 
dt <- data.table(xdata=runif(15), ydata=runif(15), level=rep(c("a","b","c"), each=5),
  key="level")

Here is where the magic happens:

hulls <- dt[, .SD[chull(xdata, ydata)], by = level]

Plotting the result:

ggplot(dt,aes(x=xdata,y=ydata,color=level)) +
    geom_point() +
    geom_polygon(data = hulls,aes(fill=level,alpha = 0.5))

produces

enter image description here

It works because chull returns a vector of indexes that need to be selected from the data to form a convex hull. We then subset each individual data frame with .SD[...], and data.table joins them together by level.

Victor K.
  • 4,054
  • 3
  • 25
  • 38
  • allow.cartesian is a trick I didn't know about, thank you. As in your example, the problem I'm having is keeping the level information. The author of the question: http://stackoverflow.com/questions/14419493/drawing-outlines-around-multiple-geom-point-groups-with-ggplot was able to create convex hull shapes around multiple groups using ddply. I'm essentially trying to repeat the output of the author's original question using a data.table approach. – Docuemada May 08 '13 at 00:54
  • Could you please clarify what to you mean by 'keep the level information'? As in, given the `hulls` and `dt` as above, what do you want the output to be? – Victor K. May 08 '13 at 01:21
  • hulls[dt, allow.cartesian = TRUE] returns a data.table of 60 rows. However, for the three hulls "a","b", and "c", I shouldn't need 60 points to describe the shapes, since some points lie inside the shape. I'll put an example undesired output in my question to help illustrate. +1 for the allow.cartesian by the way. – Docuemada May 08 '13 at 02:02
  • 3
    `hulls <- dt[dt[, .I[chull(xdata, ydata)], by = level]$V1]` will be more efficient as `data.table` will not construct `.SD` – mnel May 08 '13 at 03:20
  • Thanks @mnel - `.I` is a nice trick and I should remember to use it more often. I will leave my solution as is, though, as it's much easier to explain to a new `data.table` user. – Victor K. May 08 '13 at 03:27
  • @VictorK. Very slick solution, thank you, and thanks for the explanation. – Docuemada May 08 '13 at 13:23