0

I am using the R programming language. I am following this tutorial over here for making 3d kernel density plots in R: https://plotly.com/r/3d-surface-plots/:

library(MASS)
library(plotly)

kd <- with(MASS::geyser, MASS::kde2d(duration, waiting, n = 50))
fig <- plot_ly(x = kd$x, y = kd$y, z = kd$z) %>% add_surface()

fig

I decided to try this on my own data :

#generate data
a = rnorm(100,10,10)
b = rnorm(100,5,5)
c = rnorm(100,5,10)
d = data.frame(a,b,c)

#make 3d plot (I think n = 50 refers to selecting the first 50 points?)
kd <- with(d, MASS::kde2d(a,b,c, n = 50))
fig <- plot_ly(x = kd$x, y = kd$y, z = kd$z) %>% add_surface()

But this results in the following error:

Error in MASS::kde2d(a, b, c, n = 50) : 
  bandwidths must be strictly positive

This error prevents me from creating the "kd" object.

Can someone please tell me what am I doing wrong? Is there a problem with the specific data I am using? Or is this a syntax error?

Thanks

stats_noob
  • 5,401
  • 4
  • 27
  • 83
  • Does this answer your question? [Error using geom\_density\_2d() in R : Computation failed in \`stat\_density2d()\`: bandwidths must be strictly positive](https://stackoverflow.com/questions/53075331/error-using-geom-density-2d-in-r-computation-failed-in-stat-density2d-b) – tjebo Feb 01 '21 at 14:18

1 Answers1

2

You seem to be misunderstanding the purpose of kde2d. From help(kde2d):

Two-dimensional kernel density estimation with an axis-aligned bivariate normal kernel, evaluated on a square grid.

From the same help file regarding the h argument:

h
vector of bandwidths for x and y directions. Defaults to normal reference bandwidth (see bandwidth.nrd). A scalar value will be taken to apply to both directions.

You are passing c, a length 100 numeric vector as h. You appear to be trying to pass data to h, this does not make sense. Pass either one or two values for bandwidth or nothing and accept the default.

From lines 31 and 32 of the source, we can see why you got the error:

    if (any(h <= 0))
        stop("bandwidths must be strictly positive")

Thus, if either of the first two values of c are negative or zero, you will get this error.


The n argument, as described in the help file:

n
Number of grid points in each direction. Can be scalar or a length-2 integer vector.

This determines the grid that the density is provided. If you provide a single value, a square grid is produced.

Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
  • thank you for your answer! does that mean if i say (please remove the semi colons ; they are just intended to show spacings) : a = rnorm(100,10,10); b = rnorm(100,5,5); c = rnorm(100,5,10); d = data.frame(a,b,c); d = abs(d); kd <- with(d, MASS::kde2d(a,b,c, n = 50)); fig <- plot_ly(x = kd$x, y = kd$y, z = kd$z) %>% add_surface(); this will fix the problem? – stats_noob Feb 01 '21 at 16:23
  • does "n = 50" mean that only the first 50 points are being used? – stats_noob Feb 01 '21 at 16:23
  • 1
    I've edited the answer to try and clear up your confusion, `h` is an argument for bandwidth, not data. – Ian Campbell Feb 01 '21 at 16:29
  • Thank you for your clarifications. Can you please take a look at this question if you have time? https://stackoverflow.com/questions/65996674/r-plot-axis-display-values-larger-than-the-original-data thanks – stats_noob Feb 01 '21 at 17:44