0

I'm plotting correlations in ggpairs and am splitting the data based on a filter.

The density plots are normalising themselves on the number of data points in each filtered group. I would like them to normalise on the total number of data points in the entire data set. Essentially, I would like to be able to have the sum of the individual density plots be equal to the density plot of the entire dataset.

I know this probably breaks the definition of "density plot", but this is a presentation style I'd like to explore.

In plain ggplot, I can do this by adding y=..count.. to the aesthetic, but ggpairs doesn't accept x or y aesthetics.

Some sample code and plots:

set.seed(1234)
group = as.numeric(cut(runif(100),c(0,1/2,1),c(1,2)))
x = rnorm(100,group,1)
x[group == 1] = (x[group == 1])^2
y = (2 * x) + rnorm(100,0,0.1)
data = data.frame(group = as.factor(group), x = x, y = y)

#plot  of everything
data %>% 
  ggplot(aes(x)) + 
  geom_density(color = "black", alpha = 0.7)

Density plot of the whole data set

#the scaling I want
data %>% 
  ggplot(aes(x,y=..count..,  fill=group)) + 
  geom_density(color = "black", alpha = 0.7)

Density plot of dataset scaled to the total count

#the scaling I get
data %>% 
  ggplot(aes(x,  fill=group)) + 
  geom_density(color = "black", alpha = 0.7)

Density plot of dataset scaled to the count of each group

data %>% ggpairs(., columns = 2:3,
             mapping = ggplot2::aes(colour=group), 
             lower = list(continuous = wrap("smooth", alpha = 0.5, size=1.0)),
             diag = list(continuous = wrap("densityDiag", alpha=0.5 ))
)

The actual ggpairs plot

Are there any suggestions that don't involve reformatting the entire dataset?

masher
  • 3,814
  • 4
  • 31
  • 35
  • I think you can try to define a custom function to do this. Maybe give this a shot? https://stackoverflow.com/questions/45964883/custom-group-mean-function-for-ggpairs – StupidWolf Feb 11 '20 at 07:15

1 Answers1

0

I am not sure I understand the question but if the densities of both groups plus the density of the entire data is to be plotted, it can easily be done by

  1. Getting rid of the grouping aesthetics, in this case, fill.
  2. Placing another call to geom_density but this time with inherit.aes = FALSE so that the previous aesthetics are not inherited.

And then plot the densities.

library(tidyverse)

data %>% 
  ggplot(aes(x, y=..count.., fill = group)) + 
  geom_density(color = "black", alpha = 0.7) +
  geom_density(mapping = aes(x, y = ..count..),
               inherit.aes = FALSE)

enter image description here

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • I would like to scale the densities plotted in ggpairs such that they are scaled to the total number of datapoints, not just the datapoints in the individual groups. Also, what is the cause of the mismatch in count at x~7.5 in your plot? I would think that it should match exactly. ie green + pink = black – masher Feb 11 '20 at 08:23
  • @masher The mismatch is probably due to automatic bandwidth selection, look at the point between 5.0 and 7.5 here the total kde is *smaller* than the red one. The same goes for the point after 7.5. – Rui Barradas Feb 11 '20 at 08:54