Overlaying two histograms in R Plotly

Question

I'm trying to overlay two histogram plots in R plotly. However only one of them shows up. Here's the code I'm using with some random data:

    myDF <- cbind.data.frame(Income = sample(1:9, size = 1000, replace= TRUE),
                           AgeInTwoYearIncrements = sample(seq(from = 2, to = 70, by = 2), size = 1000, replace = TRUE))


plot_ly(data = myDF, alpha = 0.6) %>% 
  add_histogram(x = ~Income, yaxis = "y1") %>% 
  add_histogram(x = ~AgeInTwoYearIncrements, yaxis = "y2") %>% 
  layout(
    title = "Salary vs Age",
    yaxis = list(
      tickfont = list(color = "blue"),
      overlaying = "y",
      side = "left",
      title = "Income"
    ),
    yaxis2 = list(
      tickfont = list(color = "red"),
      overlaying = "y",
      side = "right",
      title = "Age"
    ),
    xaxis = list(title = "count")
  )

Any help would be much appreciated!

score 4 · Accepted Answer · edited Jun 20 '20 at 09:12

It is the main cause to give the 1st yaxis overlaying. And because xaxis is count, Income and Age is y.

plot_ly(data = myDF, alpha = 0.6) %>% 
  add_histogram(y = ~Income, yaxis = "y1") %>%    # not `x =`
  add_histogram(y = ~AgeInTwoYearIncrements, yaxis = "y2") %>% 
  layout(
    title = "Salary vs Age",
    yaxis = list(
      tickfont = list(color = "blue"),
      # overlaying = "y",     # the main cause is this line.
      side = "left",
      title = "Income"
    ),
    yaxis2 = list(
      tickfont = list(color = "red"),
      overlaying = "y",
      side = "right",
      title = "Age"
    ),
    xaxis = list(title = "count")
  )

[Edited: just flip]

plot_ly(data = myDF, alpha = 0.6) %>% 
  add_histogram(x = ~ Income, xaxis = "x1") %>% 
  add_histogram(x = ~ AgeInTwoYearIncrements, xaxis = "x2") %>% 
  layout(
    margin = list(t = 60),
    title = "Salary vs Age",
    xaxis = list(
      tickfont = list(color = "blue"),
      side = "left",
      title = "Income"
    ),
    xaxis2 = list(
      tickfont = list(color = "red"),
      overlaying = "x",
      side = "top",
      position = 0.95,
      title = "<br>Age"
    ),
    yaxis = list(title = "count")
  )

Is it possible to flip the graph 90 degrees? So that the histogram points up? Thank you! — user1357015, Oct 09 '16 at 09:34
Also, I'm not sure this plot quite makes sense... the count is on the x-axis but the orientation is horizontal? — user1357015, Oct 09 '16 at 09:49

and-bri · Answer 2 · 2016-10-09T10:39:43.217

You can mix histograms:

plot_ly(data = myDF, alpha = 0.6) %>% 
  add_histogram(x = ~Income) %>%
  add_histogram(x = ~AgeInTwoYearIncrements) %>%
layout(
  title = "Salary and Age",
  yaxis = list(
    tickfont = list(color = "blue"),
    overlaying = "y",
    side = "left",
    title = "count"
  ),
  xaxis = list(title = "Salary and Age value")
)

A histogram has normally on the y-axis the frequency / count and not on the x-axis. We can produce a diagram like you want but I'm not sure if it is still a histogram.

Also, like you see in my picture you the frequency/count for salary (here blue) is more high and the variability is less then age. That make it difficult for a good looking diagram. Maybe this is just a problem of your sample data...

So When you like to go with the histogram function, you have to invert the meaning of the frequency and the value on the x-axis.

But anyway, I think a scaternplot would be a better solution to show the relation between salary and age.

edit:

This is the result I get when I run your code:

Like this I don't see the sense in the plot and what you want. The meaning of the first orange colum is that a age of 59 occurs between 0 and 5 times in your dataset. The third colum means a age of 88 ocours between 10 and 15 times in your dataset. To present this information in a barplot don't work. Because you can have several Age-values in on categorie of counts...I hope this is clear.

Anyway, to answer your question I need more clarification.

score 1 · Answer 3 · answered Apr 17 '20 at 22:06

Following the responses here, I wanted to answer this with an example that others can easily use when for instance plotting two overlapping histograms.

# Add required packages
library(plotly)    

# Make some sample data
a = rnorm(1000,4)
b = rnorm(1000,6)

# Make your histogram plot with binsize set automatically 
fig <- plot_ly(alpha = 0.6) # don't need "nbinsx = 30" 
fig <- fig %>% add_histogram(a, name = "first")
fig <- fig %>% add_histogram(b, name = "second")
fig <- fig %>% layout(barmode = "overlay", 
                      yaxis = list(title = "Frequency"),
                      xaxis = list(title = "Values"))

# Print your histogram 
fig

And here is the result of the code:

score 0 · Answer 4 · answered Feb 04 '22 at 14:02

Easy way to handle any number of dimensions without repetition

TL;DR: You can rearrange your data to long-form before passing it to plot_ly().

df |>
  mutate(row_number = row_number()) |>
  pivot_longer(!row_number) |>
  plot_ly() |>
  add_histogram(x = ~ value,
                color = ~ name,
                opacity = 0.5) |>
  layout(barmode = 'overlay')

Explanation

Given a DF with multiple columns, like the one the OP posted:

df = cbind.data.frame(Income = sample(1:9, size = 1000, replace= TRUE),
                      AgeInTwoYearIncrements = sample(seq(from = 2, to = 70, by = 2), size = 1000, replace = TRUE))

Then, using tidyr::pivot_longer():

df |> mutate(row_number = row_number()) |> pivot_longer(!row_number)

This gives:

# A tibble: 2,000 × 3
   row_number name                   value
        <int> <chr>                  <dbl>
 1          1 Income                     1
 2          1 AgeInTwoYearIncrements    20
 3          2 Income                     1
 4          2 AgeInTwoYearIncrements    48
 5          3 Income                     3
 6          3 AgeInTwoYearIncrements    26
 7          4 Income                     4
 8          4 AgeInTwoYearIncrements    30
 9          5 Income                     4
10          5 AgeInTwoYearIncrements    60
# … with 1,990 more rows

Finally, just pipe this to plot_ly(), so the full command is:

df |>
  # Add a column to keep track of the row numbers
  mutate(row_number = row_number()) |>
  # Squash and lengthen the df with one row per row per column (in this case, double its length)
  pivot_longer(!row_number) |>
  plot_ly() |>
  # The magic is here. We set color to track the name variable, which will
  # add a separate series per column.
  # We set the opacity so we can see where our plots overlap.
  add_histogram(x = ~ value,
                color = ~ name,
                opacity = 0.5) |>
  # Without setting this, bars will be plotted side by side for the same x value
  # rather than overlapping.
  layout(barmode = 'overlay')

Overlaying two histograms in R Plotly

4 Answers4

Easy way to handle any number of dimensions without repetition

Explanation

Output