1

I want to plot groups of boxplots, as the following plot shows:enter image description here

Data generation

bp_data = list(
  set = c(1,1,2,2,3,3,4,4),
  var = rep(c("red", "green"), 4),
  val = t(matrix(runif(800, 1, 50)+seq(1,40,length=800 ), 100, 8)) # fake data
)

str(bp_data)
#List of 3
# $ set: num [1:8] 1 1 2 2 3 3 4 4
# $ var: chr [1:8] "red" "green" "red" "green" ...
# $ val: num [1:8, 1:100] 37.9 27.1 40.7 22.4 32.4 ...

Plot command

#plot boxplots by variable and set
bp = boxplot(
  val ~ (var * set), 
  bp_data,
  col = bp_data$var,
  xaxt = "n"  
)

# add axis labels and legend
axis(1, at = seq(1.5,7.5,2), labels = paste("set", c(1:4)), tick=FALSE , cex.axis=1.2)
legend("topright", c("var 1", "var 2"), col=c("red", "green"),pch =15)

I'm quite happy with the resulting plot using the sample data set. In this sample data you can see, that all distributions have an n of 100.

However, when I switch to real data, of course my vectors do not have the length 100, hence, I can't provide boxplot() with a matrix containing the vals. I do not know, how to adjust the formula given to boxplot() to create the plot with real data.

I tried to prepare my data with the following:

bp_data = list(
  set = c(1,1,2,2,3,3,4,4),
  var = rep(c("red", "green"), 4),
  val = list(
    a = runif(100, 1, 10),
    b = runif(100, 1, 12),
    c = runif(100, 1, 13), 
    d = runif(100, 1, 14),
    e = runif(100, 1, 15),
    f = runif(100, 1, 16),
    g = runif(100, 1, 17),
    h = runif(100, 1, 18)
  )    
)

The values were stored as list, to potentially differ the length of the individual vectors. But boxplot does not accepts it in its current form.

The real data

Before I use my real data I wanted to get the plot working, to generate my data accordingly. But to explain, this is how I get my data. I'm iterating over a object by set (set). For each set I calculate variables, for which I would come up with a name for each (var) and then calculate the values (val). Thus, for each iteration I will generate a var string of lenght 1, the set string of the iteration and values in the length of 80-100.

Maybe I can prepare the data better and do something with melt() of reshape2? I am not experienced with that...

Suggestion

Is this a good way to solve it?

bp_data  = data.frame(set = character(0), var = character(0), val = numeric(0))
for(set in names(data.obj)){

  # store slope  
  bp_data = rbind(bp_data, data.frame(
    set = survey,
    var = "slope",
    val = data.obj[[set]]$height / data.obj[[set]]$length
  ))

  # store whatever
  bp_data = rbind(bp_data, data.frame(
    set = survey,
    var = "whatever",
    val = data.obj[[set]]$var1 / data.obj[[set]]$var2 + data.obj[[set]]$var3
  ))

}
agoldev
  • 2,078
  • 3
  • 23
  • 38
  • 3
    How is your real data stored? I mean, before you try to prepare if for plotting. Is it in a list? In separate objects? In long format in a data frame? Instead of just sharing data with 100 observations that can/should work, can you share a *small* set of data reproducible data that shows the problem? Two vars and two sets, with between 10 and 20 observations each should be plenty. – Gregor Thomas Nov 06 '17 at 18:11
  • Also, if you are open to a `ggplot` solution instead of a `graphics::boxplot`, that would be useful to know. – Gregor Thomas Nov 06 '17 at 18:16
  • I updated my question. ggplot is not an option. – agoldev Nov 06 '17 at 18:42

1 Answers1

0

boxplot does not need a matrix nor lists of the same length. Below, I modified your data generation step to make each list have a different length. Then only minor changes to your code are needed.

## Modified data
bp_data = list(
  set = c(1,1,2,2,3,3,4,4),
  var = rep(c("red", "green"), 4),
  val = list(
    a = runif(70, 1, 10),
    b = runif(75, 1, 12),
    c = runif(80, 1, 13), 
    d = runif(85, 1, 14),
    e = runif(90, 1, 15),
    f = runif(95, 1, 16),
    g = runif(100, 1, 17),
    h = runif(105, 1, 18)
  )    
)

## Modified code
bp = boxplot(
  bp_data$val, 
  col= bp_data$var,
  xaxt = "n"
)
axis(1, at = seq(1.5,7.5,2), labels = paste("set", c(1:4)), tick=FALSE , cex.axis=1.2)
legend("topleft", c("var 1", "var 2"), col=c("red", "green"),pch =15)

Barplot

G5W
  • 36,531
  • 10
  • 47
  • 80