1

I have a dataframe that looks like this:

df = 
           b   e   t   w   e   e   n
    [1,] 219 125 125  94 172 109 172
    [2,]  78  78 250 156 172 141 140
    [3,] 250 204 296 829 265 125 203
    [4,] 406 110 172 187  63 156 109

When I melt it, using melt(df), I get:

df.m = 
   X1 X2 value
1   1  b   219
2   2  b    78
3   3  b   250
4   4  b   406
5   1  e   125
6   2  e    78
7   3  e   204
8   4  e   110
9   1  t   125
10  2  t   250
11  3  t   296
12  4  t   172
13  1  w    94
14  2  w   156
15  3  w   829
16  4  w   187
17  1  e   172
18  2  e   172
19  3  e   265
20  4  e    63
21  1  e   109
22  2  e   141
23  3  e   125
24  4  e   156
25  1  n   172
26  2  n   140
27  3  n   203
28  4  n   109

The problem is, when I want to make a boxplot of each letter, it groups it by just the letter. In the example above, there are 3 "e"s, which get clumped together. So, the formula below produces the boxplot below:

ggplot(df.m, aes(x=X2, y=value)) + 
geom_boxplot(outlier.shape=NA) + 
xlim('b','e','t','w','e','e','n')

enter image description here

If I could add a column to the melted dataframe that retains the initial column index, it would be easy to make a correct boxplot. Is there a way to do this?

Adam_G
  • 7,337
  • 20
  • 86
  • 148

2 Answers2

3

One option would be to create a new column based on the "1" (assuming that the dataset is ordered) to get a logical vector, get the cumulative sum, convert to character ('i1'), then use the OP's code for ggplot and finally change the tick mark labels with scale_x_discrete.

library(dplyr)
library(ggplot2)
df.m %>% 
  mutate(i1 = as.character(cumsum(X1==1))) %>%
  ggplot(., aes(x=i1, y= value))+
        geom_boxplot(outlier.shape=NA) +
        scale_x_discrete(breaks= c("1", "2", "3", "4", "5", "6", "7"), 
                         labels= c("b", "e", "t", "w", "e", "e", "n"))+
        xlab(NULL)

enter image description here


Or we could set the column names in the original matrix as the sequence of columns, melt, and directly use it on ggplot

library(reshape2)
`colnames<-`(df, seq_len(ncol(df))) %>% 
          melt() %>% 
          ggplot(., aes(x=as.character(Var2), y= value)) + 
              geom_boxplot(outlier.shape=NA) + 
              scale_x_discrete(breaks = seq_len(ncol(df)), 
                               labels = colnames(df)) + 
              xlab(NULL)

data

df.m <- structure(list(X1 = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 
3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 
3L, 4L), X2 = c("b", "b", "b", "b", "e", "e", "e", "e", "t", 
"t", "t", "t", "w", "w", "w", "w", "e", "e", "e", "e", "e", "e", 
"e", "e", "n", "n", "n", "n"), value = c(219L, 78L, 250L, 406L, 
125L, 78L, 204L, 110L, 125L, 250L, 296L, 172L, 94L, 156L, 829L, 
187L, 172L, 172L, 265L, 63L, 109L, 141L, 125L, 156L, 172L, 140L, 
203L, 109L)), .Names = c("X1", "X2", "value"), class = "data.frame",
        row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
"25", "26", "27", "28"))

df <-  structure(c(219L, 78L, 250L, 406L, 125L, 78L, 204L, 110L, 125L, 
250L, 296L, 172L, 94L, 156L, 829L, 187L, 172L, 172L, 265L, 63L, 
109L, 141L, 125L, 156L, 172L, 140L, 203L, 109L), .Dim = c(4L, 
7L), .Dimnames = list(NULL, c("b", "e", "t", "w", "e", "e", "n")))
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Another solution using basic R plotting:

boxplot(df)

enter image description here

Kunal Puri
  • 3,419
  • 1
  • 10
  • 22
  • 1
    Sorry, I should have mentioned that the problem is I'm eventually feeding this to plotly. I couldn't use boxplot() with plotly/ggplotly – Adam_G May 07 '16 at 03:07