R: Generate normally-distributed spatial coordinates around point-of-interest

Question

I have geographic (i.e. un-projected) spatial coordinates and would like to generate an n number of random coordinates around each original coordinate pair, such that the generated points follow a bi-variate normal distribution. The mean of the distribution ought to be the position of the original coordinates, with the standard deviation being a radius of r degree latitude/longitude.

I tried to randomly generate coordinates for each axis separately (i.e. lat and lon) using the rnorm function. Only to later realize that this produces a distribution much wider in all directions than the standard deviation of 1 degree. The following code illustrates this attempt and the plot shows the erroneous output (i.e. a large proportion of points falling far outside the desired standard deviation of 1 degree).

     library(sp)
     library(scales)
     library(geosphere)
     library(MASS)

     ## spatial point of interest (i.e. Observed data). 
     onepnt <- data.frame(longitude=1, latitude=1)

     #### First attempt: separately generate longitude and latitude points based on random distributions with mean of 'onepnt' and st.dev. of 1 degree. 
     rlon <- rnorm(1000, mean=onepnt$longitude, sd=1) # longitudinal degree error adjusted for latitude
     rlat <- rnorm(1000, mean=onepnt$latitude, sd=1) # mean and standard deviation in degrees

     rcoords1 <- cbind.data.frame(longitude=rlon, latitude=rlat)

     ####### Second attempt: generate lat/lons from a bivariate normal distribution 

     # Set parameters for bivariate normal dist.
     rho <- 0 # correlation coefficient
     mu1 <- onepnt$longitude # data coordinates as mean (center) of distribution
     mu2 <- onepnt$latitude
     s1 <- 1 # 1 degree = 1 st.dev.
     s2 <- 1

     mu <- c(mu1,mu2) # vector of means 
     sigma <- matrix(c(s1^2, s1*s2*rho, s1*s2*rho, s2^2), 2) # Covariance matrix

     # mvrnorm
     rlatlon <- as.data.frame(mvrnorm(1000, mu=mu, Sigma=sigma))
     colnames(rlatlon) <- c('longitude', 'latitude')

     ## Assess distribution of randomized points
     wgs84 <- CRS('+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs')

     ###### 1 - separate univariate #####
     rtabsp1       <- SpatialPointsDataFrame(rcoords1, proj4string=wgs84, data = rcoords1)
     origcoordssp1 <- SpatialPointsDataFrame(onepnt, proj4string=wgs84, data = onepnt)

     ###### 2 - Bivariate #####
     rtabsp2       <- SpatialPointsDataFrame(rcoords1, proj4string=wgs84, data = rcoords2)

     plot(rtabsp1, pch=16, col=alpha('black', 0.5))
     plot(origcoordssp, pch=20, col='red', cex=2, add=T)

    sdbuff1 <- raster::buffer(origcoordssp, 111*1000) ## generate buffer within which 1SD (68%) of generated points ought to fall
    plot(sdbuff1, border=4, add=T)

    ## Check Great Circle distances from the point-of-interest 
    dist2center1 <- distHaversine(rtabsp1, origcoordssp)/1000
    dist2center2 <- distHaversine(rtabsp2, origcoordssp)/1000

    hist(dist2center1) ## most points are not near the center (i.e. few points near '0'
    hist(dist2center2)
    mean(dist2center1) ## means should 
    mean(dist2center2)

So, the problem I see is that most points generated in these two ways are not near the center point (i.e. not near zero distance from the point-of-interest).

As far as I understand, this happens in the first approach because re-combining lat/long coordinates which are drawn separately produces coordinates from within a square-shaped area, rather than from within a circle-shaped area.

Any help with this would be greatly appreciated! Either a way to simplify the problem (i.e. draw randomly from within buffers of varying radii) or ideally a way to make the bi-variate solution work (I didn't post this to avoid muddying the water).

(NOTE: Ultimately I will be working with a large, global data set, so in order to save on computation time I would prefer not to have to work in projections, if possible. That said, if the root of the issue is projection, then so be it!)

If no correlation is needed then what you did is exactly what you want to do. It sounds like the confusion is that you think that if the sd is 1 that a larger proportion of the points should be within 1 unit of the center than actually is. The normal distribution is such that about 68% of the points will be within 1 standard deviation. So approximately 32% will fall outside of 1 standard deviation. If you want a certain proportion of points to fall within 1 unit of your center then you can choose a different standard deviation. — Dason, Jan 24 '19 at 18:24
Thanks for you comment, @Dason. Please see the bit of code I've just added at the end. In the histograms produced there you will see that most points fall not near 0 distance from the center, but rather several hundred kilometers (great circle) away! This can't be right if the mean of the distribution is that central point...Do you see what I mean? — bealhammar, Jan 25 '19 at 14:05
I haven't dove into your code and correct me if I'm wrong but isn't 1 degree latitude a bit over 100km? So it wouldn't be extremely unlikely to see points 300 km away being generated. If that's too far then you need to adjust the standard deviation. — Dason, Jan 25 '19 at 15:21
@Dason. Yes, 1 degree of latitude is approximately 111km. I think you are right that there is nothing wrong with this method. I just had the wrong expectation that most points would fall 'close' to the mean when the distance is calculated; so I was taken aback when looking at the histograms of distances between random points and the true point and seeing so few points close to 0 (the mean). Thank you. — bealhammar, Jan 25 '19 at 16:30

R: Generate normally-distributed spatial coordinates around point-of-interest

0 Answers0