13

See this exampleExample of overlaid scatter plots

This was created in matlab by making two scatter plots independently, creating images of each, then using the imagesc to draw them into the same figure and then finally setting the alpha of the top image to 0.5.

I would like to do this in R or matlab without using images, since creating an image does not preserve the axis scale information, nor can I overlay a grid (e.g. using 'grid on' in matlab). Ideally I wold like to do this properly in matlab, but would also be happy with a solution in R. It seems like it should be possible but I can't for the life of me figure it out.

So generally, I would like to be able to set the alpha of an entire plotted object (i.e. of a matlab plot handle in matlab parlance...)

Thanks,

Ben.

EDIT: The data in the above example is actually 2D. The plotted points are from a computer simulation. Each point represents 'amplitude' (y-axis) (an emergent property specific to the simulation I'm running), plotted against 'performance' (x-axis).

EDIT 2: There are 1796400 points in each data set.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
Ben J
  • 1,367
  • 2
  • 15
  • 33
  • 2
    I don't have a good solution for you (in ggplot) except to say that while I'm sure there's some reason why you "have" to use scatterplots, you probably should just be overlaying two density estimates here, which makes the solution trivially easy with alpha levels. – joran Mar 14 '12 at 17:40
  • Create a variable for each point to determine which set it lies in: A, B or both. Use this to select colours. – James Mar 14 '12 at 17:54
  • not trivial, but what about combining the information from two sets of hexagonal bins? – Ben Bolker Mar 14 '12 at 18:45
  • One brute force solution would be to: (i) create two separate plots with the same extent, axes, etc.; (ii) use grid.cap() to capture raster representations of both; (iii) convert the values in the rasters to integers; (iv) add one raster plus 10 times the other; (v) replace the integers in the resultant raster with appropriate color names; and then (vi) plot it with grid.raster( , interpolate=FALSE). If you wanted to do do this frequently, it might well be worth the effort required to write a small function that performs these steps for any pair of plot calls that you pass to it. – Josh O'Brien Mar 14 '12 at 20:21
  • I've edited my original post to explain the type of data being plotted. I don't believe that generating 2D density estimates, as perhaps you're suggesting Joran, will really yield what I'm hoping to do, since this will result in multiple contours for each data set. I guess I want one contour around each data set so that the general shape of each data set is preserved; in this way I could get rid of the individual points which would be nice.. Today is the first time I've done any R -- taking me a while to get my head around... – Ben J Mar 14 '12 at 22:00

4 Answers4

11

Using ggplot2 you can add together two geom_point's and make them transparent using the alpha parameter. ggplot2 als adds up transparency, and I think this is what you want. This should work, although I haven't run this.

dat = data.frame(x = runif(1000), y = runif(1000), cat = rep(c("A","B"), each = 500))
ggplot(aes(x = x, y = y, color = cat), data = dat) + geom_point(alpha = 0.3)

ggplot2 is awesome!

This is an example of calculating and drawing a convex hull:

library(automap)
library(ggplot2)
library(plyr)
loadMeuse()
theme_set(theme_bw())

meuse = as.data.frame(meuse)
chull_per_soil = ddply(meuse, .(soil), 
           function(sub) sub[chull(sub$x, sub$y),c("x","y")])

ggplot(aes(x = x, y = y), data = meuse) +
  geom_point(aes(size = log(zinc), color = ffreq)) +
  geom_polygon(aes(color = soil), data = chull_per_soil, fill = NA) +
  coord_equal()

which leads to the following illustration:

enter image description here

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
  • Thanks Paul, -- unfortunately, since each class of data I'm plotting is exceptionally dense (there are 1796400 points in each data set), many points in each set overlap thus effectively removing the transparency in the densest regions. This means that a dense region in the 'top' data set will often be too opaque to see any points in the data set underneath. – Ben J Mar 14 '12 at 22:24
  • 2
    you could subsample your data. Or draw a convex hull around tye points and use transparent `geom_polygon`'s – Paul Hiemstra Mar 14 '12 at 22:32
  • I think a convex hull is the way forward. Cheers. – Ben J Mar 14 '12 at 23:28
  • I added an example of how to create and plot a convex hull in R. – Paul Hiemstra Mar 15 '12 at 07:52
  • :), the meuse dataset is multifunctional :) – Paul Hiemstra Mar 15 '12 at 12:06
  • Thanks Paul that's a nice illustration. For extra bonus points, might you know how to create a concave rather than convex hull? ;-) Cheers! – Ben J Mar 15 '12 at 15:55
  • Seem to be missing a close parenthesis here: `dat = data.frame(x = runif(1000), y = runif(1000), cat = rep(c("A","B"), each = 500))` – Ben Mar 15 '12 at 17:29
5

You could first export the two data sets as bitmap images, re-import them, add transparency:

overlay

library(grid)

N <- 1e7 # Warning: slow
d <- data.frame(x1=rnorm(N),
                x2=rnorm(N, 0.8, 0.9),
                y=rnorm(N, 0.8, 0.2),
                z=rnorm(N, 0.2, 0.4))

v <- with(d, dataViewport(c(x1,x2),c(y, z)))

png("layer1.png", bg="transparent")
with(d, grid.points(x1,y, vp=v,default="native",pch=".",gp=gpar(col="blue")))
dev.off()
png("layer2.png", bg="transparent")
with(d, grid.points(x2,z, vp=v,default="native",pch=".",gp=gpar(col="red")))
dev.off()

library(png)
i1 <- readPNG("layer1.png", native=FALSE)
i2 <- readPNG("layer2.png", native=FALSE)

ghostize <- function(r, alpha=0.5)
  matrix(adjustcolor(rgb(r[,,1],r[,,2],r[,,3],r[,,4]), alpha.f=alpha), nrow=dim(r)[1])

grid.newpage()
grid.rect(gp=gpar(fill="white"))
grid.raster(ghostize(i1))
grid.raster(ghostize(i2))

you can add these as layers in, say, ggplot2.

baptiste
  • 75,767
  • 19
  • 198
  • 294
  • Thanks baptiste! That's pretty much what I did in my original post above, albeit in matlab. Nice to see how you can do it in R though! Cheers. – Ben J Mar 15 '12 at 15:52
  • btw, you *can* preserve the axis, scales, add a grid, etc. with this method – baptiste Mar 16 '12 at 09:33
0

Use the transparency capability of color descriptions. You can define a color as a sequence of four 2-byte words: muddy <- "#888888FF" . The first three pairs set the RGB colors (00 to FF); the final pair sets the transparency level.

Carl Witthoft
  • 20,573
  • 9
  • 43
  • 73
  • Thanks, but the problem with doing this is that it would set the alpha of every point independently. This means that when points overlap in a *single* set of data, the region is far darker meaning that the overall effect is less transparency. This would be fine if no points in the single set overlap. But if 10 data points from the single set overlap and the alpha is 0.1, then the resulting alpha is 1 (i.e. zero transparency...)...N.B. there are two data sets in the above example...make sense? – Ben J Mar 14 '12 at 17:27
  • True enough -- so I guess I have to ask what it is you're actually generating there: If it's a pure scatterplot then you would want to darken overlapping points. If it's a shiny representation of the distribution functions, then you shouldn't be scatterplotting to begin with. Or, just make your symbols small enough that they don't overlap! – Carl Witthoft Mar 14 '12 at 21:35
  • Have added an edit to clarify the type of data I'm plotting...They're not distribution functions per se.. – Ben J Mar 14 '12 at 22:02
  • There are some good suggestions posted. I would go w/ the `ggplot` one. Meanwhile, I'm now inspired to ask: is there a 2-D analog to `intersect` ? (I'm going to post that, so don't answer here :-) ) – Carl Witthoft Mar 15 '12 at 12:01
0

AFAIK, your best option with Matlab is to just make your own plot function. The scatter plot points unfortunately do not yet have a transparency attribute so you cannot affect it. However, if you create, say, most crudely, a bunch of loops which draw many tiny circles, you can then easily give them an alpha value and obtain a transparent set of data points.

Superbest
  • 25,318
  • 14
  • 62
  • 134
  • I like this idea, but its far too slow in practice. Besides, I can basically change the transparency of points in R, instead.. – Ben J Mar 14 '12 at 23:29