19

I have a data frame which contains x-axis numeric bins and continuous y-axis data across multiple categories. Initially, I created a boxplot by making the x-axis bins "factors", and doing a boxplot of the melted data. Reproducible data:

x <- seq(1,10,by=1)
y1 <- rnorm(10, mean=3)
y2 <- rnorm(10, mean=10)
y3<- rnorm(10, mean=1)
y4<- rnorm(10, mean=8)
y5<- rnorm(10, mean=12)
df <- data.frame(x,y1,y2,y3,y4,y5)
df.m <- melt(df, id="x")

My code to create the x-axis data as a factor:

df.m$x <- as.factor(df.m$x)

My ggplot:

ggplot(df.m, aes(x=x, y=value))+
 geom_boxplot(notch=FALSE, outlier.shape=NA, fill="red", alpha=0.1)+
 theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

The resulting plot: The resulting plot:

The problem is that I cannot use x-axis numeric spacing because the x-axis is categorized as a factor, which has equal spacing. I want to be able to use something like scale_x_continuous to manipulate the axis breaks and spacing to, say, an interval of 2, rather than a boxplot every 1, but when I try to plot the data with the x-axis "as.numeric", I just get one boxplot of all of the data:

plot

Any suggestions for a way to get this continuous-looking boxplot curve (the first image) while still being able to control the numeric properties of the x-axis? Thanks!

AndMan21
  • 533
  • 1
  • 4
  • 15
  • @Henrik Doesn't the Google drive link in the question work for you? That should be the data frame for this example. Thanks for showing up and helping me again! It's been a problem-ridden day in the R world... – AndMan21 Nov 20 '14 at 22:31
  • @Henrik Gotcha, sorry about that. Working on the edit now – AndMan21 Nov 20 '14 at 22:35

2 Answers2

31

Here is a way using the original data you posted on Google - which actually was much more helpful, IMO.

ggplot(df, aes(x=CH, y=value,group=CH))+
  geom_boxplot(notch=FALSE, outlier.shape=NA, fill="red", alpha=0.2)+
  scale_x_log10()

So, as @BenBolker said before he deleted his answer(??), you should leave the x-variable (CH) as numeric, and set group=CH in the call to aes(...).

With your real data there is another problem though. Your CH is more or less logarithmically spaced, so there are about as many points < 1 as there are between 1 - 10, etc. ggplot wants to make the boxes all the same size, so with a linear x-axis the box width is smaller than the line width, and you don't see the boxes at all. Changing the x-axis to a logarithmic scale fixes that, more or less.

jlhoward
  • 58,004
  • 7
  • 97
  • 140
0

Don't make x a factor. You need to aesthetically map a group that is a factor determining which box the value is associated with, luckily, after melting, this is what you variable column is:

ggplot(df.m, aes(x = x, y = value, group = variable)) +
    geom_boxplot()

As x is still numeric, you can give it whatever values you want within a specific variable level and the boxplot will show up at that spot. Or you could transform the x axis, etc.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • The issue is this: I melted the original data mainly just to get data from having a ton of columns to just one column. I don't actually want to map by variable, but want to map by x-value. – AndMan21 Nov 20 '14 at 23:16
  • 3
    Then set `group=x` as @BenBolker said (too bad he deleted his answer). – jlhoward Nov 20 '14 at 23:18