Error using geom_density_2d() in R : Computation failed in `stat_density2d()`: bandwidths must be strictly positive

Question

In a attempt to make a test 2d density plot with ggplot2, I used the code snippet:

ggplot(df, aes(x = S1.x, y = S1.y)) + geom_point() + geom_density_2d()

and I got the error: "Computation failed in stat_density2d(): bandwidths must be strictly positive"

My dataframe looks like this:

> df

transcriptID S1.x      S1.y      S2.x       S2.y    
DQ459412     0.000000  0.000000  0.000000   0.000000
DQ459413     1.584963  2.358379  4.392317   3.085722    
DQ459415     0.000000  0.000000  0.000000   0.000000    
DQ459418     0.000000  0.000000  0.000000   0.000000    
DQ459419     0.000000  0.000000  4.000000   2.891544    
DQ459420     0.000000  0.000000  0.000000   0.000000

Also, var(df[,"S1.x"]) > 0 and var(df[,"S1.y"]) > 0.

Fig 1 - 2d density plot with error

However, I got a density plot without error by running:

ggplot(df, aes(x = S2.x, y = S2.y)) + geom_point() + geom_density_2d()

Fig 2 - density plot without error

How do I address the error in Fig 1?

Not sure what you mean by "used a custom data" - can you clarify that? Also can you add enough data to make the error message reproducible? Also posting a screenshot of the "correct result" wouldn't be bad either. — Mike Wise, Oct 31 '18 at 09:57

Mike Wise · Accepted Answer · 2018-10-31T23:42:14.453

So the real problem is that the S1.x and S1.y values only have one non-zero value in their columns. And it turns out that geom_density_2d can't really estimate a density with only a value or two. But read on...

Update:

This question has been asked before, and the answers are usually that you need to have non-zero variance in your data columns. But you do have non-zero variance, so why isn't it working?

Looking at the internals of geom_density_2d we see that it uses the MASS::kde2d package function to calculate the distribution.
Looking at kde2d we see that it uses MASS::bandwidth.nrd(df$x) to get an estimate of the bandwidth.
Looking at the help (which has the code) for bandwidth.nrd we see it uses a rule of thumb that gets the quantile of the distribution, and subtracts the 2nd quantile from the 1st quantile to get a bandwidth estimate.
Doing a quantile on your original data we see that the quantiles of the data were zero.
And running MASS::kde2d on your original data with that bandwidth.nrd estimate of the bandwidth gives you the same error:

library(MASS)
nn <- c("DQ459412","DQ459413","DQ459415","DQ459418","DQ459419","DQ459420")
s1x <- c(0,1.584963,0,0,0,0)
s1y <- c(0,2.358379,0,0,0,0) 
s2x <- c(0,4.392317,0,0,4,0)
s2y <- c(0,3.085722,0,0,2.891544,0) 
df <- data.frame(transcriptID=nn,S1.x=s1x,S1.y=s1y,S2.x=s2x,S2.y=s2y)

> quantile(df$s1x)
      0%      25%      50%      75%     100% 
0.000000 0.000000 0.000000 0.000000 1.584963 
> quantile(df$s1y)
      0%      25%      50%      75%     100% 
0.000000 0.000000 0.000000 0.000000 2.358379

h <- c(MASS::bandwidth.nrd(df$x), MASS::bandwidth.nrd(df$y))
dens <- MASS::kde2d(df$s1x, df$s1y, h = h, n = n,  lims = c(0,1,0,1))
Error in MASS::kde2d(df$s1x, df$s1y, h = h, n = n, lims = c(0, 1, 0, 1)) : bandwidths must be strictly positive

So the real criteria for using geom_density_2D is that both the x- and the y-data needs to have a non-zero gap between their 1st and 2nd quantiles.

Now to fix it, if I make a small modification - replacing one of the zeros with 0.1, like this:

nn <- c("DQ459412","DQ459413","DQ459415","DQ459418","DQ459419","DQ459420")
s1x <- c(0,1.584963,0,0,0.1,0)
s1y <- c(0,2.358379,0,0,0.1,0) 
s2x <- c(0,4.392317,0,0,4,0)
s2y <- c(0,3.085722,0,0,2.891544,0) 
df <- data.frame(transcriptID=nn,S1.x=s1x,S1.y=s1y,S2.x=s2x,S2.y=s2y)
print(df)

yielding:

  transcriptID     S1.x     S1.y     S2.x     S2.y
1     DQ459412 0.000000 0.000000 0.000000 0.000000
2     DQ459413 1.584963 2.358379 4.392317 3.085722
3     DQ459415 0.000000 0.000000 0.000000 0.000000
4     DQ459418 0.000000 0.000000 0.000000 0.000000
5     DQ459419 0.100000 0.100000 4.000000 2.891544
6     DQ459420 0.000000 0.000000 0.000000 0.000000

Then I get this plot instead of your error.

You can let that 0.1 value approach zero, eventually it will not be able to calculate a distribution anymore and you will get your error again.

One general way to deal with this situation is to add a very small quantity of noise to your data, kind of simulating the fact that any meaningful calculation based on a real measurement from a continuous distribution should be impervious to that small quantity of noise.

Hope that helps.

Yes, that helps! My actual data has 198716 rows with the S1.x and S1.y columns having more than one non-zero rows. I think the problem was the actual size of the data. When I decrease it to 200 rows, I see a tiny density plot without error. I think as the number of rows increase, the density plot gets difficult to discern. — Onyi Ukay, Oct 31 '18 at 17:35
@MikeWise, I am getting the same problem and thanks to your detective work I found the cause of the issue. Actually my 1st,2nd,3rd quantiles were all the same, but I think they have to be all 3 different, because when I changed the 1st it was still not working. Only when I also changed the 2nd I was able to get a plot. — Herman Toothrot, Oct 27 '20 at 21:27

score 5 · Answer 2 · answered Jul 10 '19 at 09:07

The answer of @Mike Wise is pretty solid indeed and my answer is somewhat complementary to it. Actually, the bandwidth.nrd function computes the difference between the 3rd and the 1st quantile not the 2nd and 1st (code from the function):

r <- quantile(distances, c(0.25, 0.75))

Instead of adding random noise to your data, I would suggest to precompute the bandwidths yourself and pass them to the function, testing for non-zero values like so:

kde2d(df$s1x, df$s1y, 
      h = c(ifelse(bandwidth.nrd(df$s1x) == 0, 0.1, bandwidth.nrd(df$s1x)),
            ifelse(bandwidth.nrd(df$s1y) == 0, 0.1, bandwidth.nrd(df$s1y))))

Hope this helps.

It seems that 1st,2nd, and 3rd have to be different. – Herman Toothrot Oct 27 '20 at 21:28 — Herman Toothrot, Oct 27 '20 at 21:28

Error using geom_density_2d() in R : Computation failed in `stat_density2d()`: bandwidths must be strictly positive

2 Answers2

Update:

Linked