4

I need to align the density line with the height of geom_histogram and keep count values on the y axis instead of density.

I have these 2 versions:

#  Creating dataframe
library(ggplot2)

values <- c(rep(0,2), rep(2,3), rep(3,3), rep(4,3), 5, rep(6,2), 8, 9, rep(11,2))
data_to_plot <- as.data.frame(values)

# Option 1 ( y scale shows frequency, but geom_density line and geom_histogram are not matching )
ggplot(data_to_plot, aes(x = values)) +
  geom_histogram(aes(y = ..count..), binwidth = 1, colour= "black", fill = "white") +
  geom_density(aes(y=..count..), fill="blue", alpha = .2)+
  scale_x_continuous(breaks = seq(0, max(data_to_plot$values), 1))

y scale shows frequency, but geom_density line and geom_histogram are not matching y scale shows frequency, but geom_density line and geom_histogram are not matching

# Option 2 (geom_density line and geom_histogram are matching, but y scale density = 1)

ggplot(data_to_plot, aes(x = values)) +
  geom_histogram(aes(y = after_stat(ndensity)), binwidth = 1, colour= "black", fill = "white") +
  geom_density(aes(y = after_stat(ndensity)), fill="blue", alpha = .2)+
  scale_x_continuous(breaks = seq(0, max(data_to_plot$values), 1))

geom_density line and geom_histogram are matching, but y scale density = 1 geom_density line and geom_histogram are matching, but y scale density = 1

What I need is plot from Option 2, but Y scale from Option 1. I can get it by adding (aes(y=1.25*..count..) for this particular data, but my data is not static and this will not work for another dataset (just modify values to test):

# Option 3 (with coefficient in aes())
ggplot(data_to_plot, aes(x = values)) +
  geom_histogram(aes(y = ..count..), binwidth = 1, colour= "black", fill = "white") +
  geom_density(aes(y=1.25*..count..), fill="blue", alpha = .2)+
  scale_x_continuous(breaks = seq(0, max(data_to_plot$values), 1))

Desired result: y scale shows frequency and geom_density line is matching with geom_histogram height Desired result: y scale shows frequency and geom_density line is matching with geom_histogram height

I cannot hardcode coefficient or bins. This problem is close to the ones discussed here, but it did not work for my case:

Programatically scale density curve made with geom_density to similar height to geom_histogram?

How to put geom_density and geom_histogram on same counts scale

teunbrand
  • 33,645
  • 4
  • 37
  • 63
Iraleksa
  • 155
  • 1
  • 9
  • I understand your desired result, but I would strongly object against it. As the integral of the density curve should equal the count for a given interval, lifting it up to the maximal count would falsify this assumption in the intervals of the other bins. – c0bra Jan 08 '21 at 14:57
  • Thank you, @c0bra, I understand your point. – Iraleksa Jan 08 '21 at 16:20

1 Answers1

5

A density curve always represents data between 0 and 1, whereas a count data are multiples of 1. So it does mostly not make sense to plot those data to the same y-axis.

The left plot shows density line and histogram for data similar to the ones from you - I just added some. The height of the bar shows the percentage of counts for the corresponding x-value. The y-scale is smaller than 1.

The right plot shows the same as the left, but another histogram is added which shows the count. The y-scales goes up and the 2 density plots shrink.

If you want to scale both to the same scale, you could to this by calculating a scaling factor. I have used this scaling factor to add a secondary y-axis to the third plot and saling the sec y-axis accordingly.

In order to make clear what belongs to what scale I have colored 2nd y-axis and the data belonging to it red.

library(ggplot2)
library(patchwork)

values <- c(rep(0,2),rep(1,4), rep(2,6), rep(3,8), rep(4,12), rep(5,7), rep(6,4),rep(7,2))
df <- as.data.frame(values)

p1 <- ggplot(df, aes(x = values)) +
  stat_density(geom = 'line') +
  geom_histogram(aes(y = ..density..), binwidth = 1,color = 'white', fill = 'red', alpha = 0.2) 

p2 <- ggplot(df, aes(x = values)) +
  stat_density(geom = 'line') +
  geom_histogram(aes(y = ..count..), binwidth = 1, color = 'white', alpha = 0.2) +
  geom_histogram(aes(y = ..density..), binwidth = 1, color = 'white', alpha = 0.2) +
  ylab('density and counts')

# Find maximum of ..density..
m <- max(table(df$values)/sum(table(df$values)))

# Find maxium of df$values
mm <- max(table(df$values))

# Create Scaling factor for secondary axis
scaleF <- m/mm

p3 <- p1 + scale_y_continuous(
  limits = c(0, m),
  # Features of the first axis
  name = "density",
  # Add a second axis and specify its features
  sec.axis = sec_axis( trans=~(./scaleF), name = 'counts')
  ) + 
  theme(axis.ticks.y.right = element_line(color = "red"),
        axis.line.y.right = element_line(color = 'red'),
        axis.text.y.right = element_text(color = 'red'),
        axis.title.y.right = element_text(color = 'red')) +
  annotate("segment", x = 5, xend = 7, 
           y = 0.25, yend = .25, colour = "pink", size=3, alpha=0.6, arrow=arrow())

p1 | p2 | p3

MarBlo
  • 4,195
  • 1
  • 13
  • 27
  • Thank you, for your answer, @MarBlo! Yet, if I want to find a way to find which coefficient I can use in this piece of code `geom_density(aes(y=1.25*..count..), fill="blue", alpha = .2)` for aligning density plot and histogram is there any way to calculate it based on the dataset values? – Iraleksa Jan 08 '21 at 16:10
  • @IrinaAleksashova I have added another plot with a 2nd y axis – MarBlo Jan 08 '21 at 17:51
  • that`s great! Thank you! – Iraleksa Jan 08 '21 at 18:15