I was trying to use ggplot2
to plot the built-in anscombe
data set in R (which contains four different small data sets with identical correlations but radically different relationships between X and Y). My attempts to reshape the data properly were all pretty ugly. I used a combination of reshape2
and base R; a Hadleyverse 2 (tidyr
/dplyr
) or a data.table
solution would be fine with me, but the ideal solution would be
- short/no repeated code
- comprehensible (somewhat conflicting with criterion #1)
- involve as little hard-coding of column numbers, etc. as possible
The original format:
anscombe
## x1 x2 x3 x4 y1 y2 y3 y4
## 1 10 10 10 8 8.04 9.14 7.46 6.58
## 2 8 8 8 8 6.95 8.14 6.77 5.76
## 3 13 13 13 8 7.58 8.74 12.74 7.71
## ...
## 11 5 5 5 8 5.68 4.74 5.73 6.89
Desired format:
## s x y
## 1 1 10 8.04
## 2 1 8 6.95
## ...
## 44 4 8 6.89
Here's my attempt:
library("reshape2")
ff <- function(x,v)
setNames(transform(
melt(as.matrix(x)),
v1=substr(Var2,1,1),
v2=substr(Var2,2,2))[,c(3,5)],
c(v,"s"))
f1 <- ff(anscombe[,1:4],"x")
f2 <- ff(anscombe[,5:8],"y")
f12 <- cbind(f1,f2)[,c("s","x","y")]
Now plot:
library("ggplot2"); theme_set(theme_classic())
th_clean <-
theme(panel.margin=grid::unit(0,"lines"),
axis.ticks.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.y=element_blank(),
axis.text.y=element_blank()
)
ggplot(f12,aes(x,y))+geom_point()+
facet_wrap(~s)+labs(x="",y="")+
th_clean