0

I have geographic (i.e. un-projected) spatial coordinates and would like to generate an n number of random coordinates around each original coordinate pair, such that the generated points follow a bi-variate normal distribution. The mean of the distribution ought to be the position of the original coordinates, with the standard deviation being a radius of r degree latitude/longitude.

I tried to randomly generate coordinates for each axis separately (i.e. lat and lon) using the rnorm function. Only to later realize that this produces a distribution much wider in all directions than the standard deviation of 1 degree. The following code illustrates this attempt and the plot shows the erroneous output (i.e. a large proportion of points falling far outside the desired standard deviation of 1 degree).

     library(sp)
     library(scales)
     library(geosphere)
     library(MASS)

     ## spatial point of interest (i.e. Observed data). 
     onepnt <- data.frame(longitude=1, latitude=1)

     #### First attempt: separately generate longitude and latitude points based on random distributions with mean of 'onepnt' and st.dev. of 1 degree. 
     rlon <- rnorm(1000, mean=onepnt$longitude, sd=1) # longitudinal degree error adjusted for latitude
     rlat <- rnorm(1000, mean=onepnt$latitude, sd=1) # mean and standard deviation in degrees

     rcoords1 <- cbind.data.frame(longitude=rlon, latitude=rlat)

     ####### Second attempt: generate lat/lons from a bivariate normal distribution 

     # Set parameters for bivariate normal dist.
     rho <- 0 # correlation coefficient
     mu1 <- onepnt$longitude # data coordinates as mean (center) of distribution
     mu2 <- onepnt$latitude
     s1 <- 1 # 1 degree = 1 st.dev.
     s2 <- 1

     mu <- c(mu1,mu2) # vector of means 
     sigma <- matrix(c(s1^2, s1*s2*rho, s1*s2*rho, s2^2), 2) # Covariance matrix

     # mvrnorm
     rlatlon <- as.data.frame(mvrnorm(1000, mu=mu, Sigma=sigma))
     colnames(rlatlon) <- c('longitude', 'latitude')

     ## Assess distribution of randomized points
     wgs84 <- CRS('+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs')

     ###### 1 - separate univariate #####
     rtabsp1       <- SpatialPointsDataFrame(rcoords1, proj4string=wgs84, data = rcoords1)
     origcoordssp1 <- SpatialPointsDataFrame(onepnt, proj4string=wgs84, data = onepnt)

     ###### 2 - Bivariate #####
     rtabsp2       <- SpatialPointsDataFrame(rcoords1, proj4string=wgs84, data = rcoords2)

     plot(rtabsp1, pch=16, col=alpha('black', 0.5))
     plot(origcoordssp, pch=20, col='red', cex=2, add=T)

    sdbuff1 <- raster::buffer(origcoordssp, 111*1000) ## generate buffer within which 1SD (68%) of generated points ought to fall
    plot(sdbuff1, border=4, add=T)

    ## Check Great Circle distances from the point-of-interest 
    dist2center1 <- distHaversine(rtabsp1, origcoordssp)/1000
    dist2center2 <- distHaversine(rtabsp2, origcoordssp)/1000

    hist(dist2center1) ## most points are not near the center (i.e. few points near '0'
    hist(dist2center2)
    mean(dist2center1) ## means should 
    mean(dist2center2)

So, the problem I see is that most points generated in these two ways are not near the center point (i.e. not near zero distance from the point-of-interest).

As far as I understand, this happens in the first approach because re-combining lat/long coordinates which are drawn separately produces coordinates from within a square-shaped area, rather than from within a circle-shaped area.

Any help with this would be greatly appreciated! Either a way to simplify the problem (i.e. draw randomly from within buffers of varying radii) or ideally a way to make the bi-variate solution work (I didn't post this to avoid muddying the water).

(NOTE: Ultimately I will be working with a large, global data set, so in order to save on computation time I would prefer not to have to work in projections, if possible. That said, if the root of the issue is projection, then so be it!)

bealhammar
  • 155
  • 1
  • 7
  • If no correlation is needed then what you did is exactly what you want to do. It sounds like the confusion is that you think that if the sd is 1 that a larger proportion of the points should be within 1 unit of the center than actually is. The normal distribution is such that about 68% of the points will be within 1 standard deviation. So approximately 32% will fall outside of 1 standard deviation. If you want a certain proportion of points to fall within 1 unit of your center then you can choose a different standard deviation. – Dason Jan 24 '19 at 18:24
  • Thanks for you comment, @Dason. Please see the bit of code I've just added at the end. In the histograms produced there you will see that most points fall not near 0 distance from the center, but rather several hundred kilometers (great circle) away! This can't be right if the mean of the distribution is that central point...Do you see what I mean? – bealhammar Jan 25 '19 at 14:05
  • I haven't dove into your code and correct me if I'm wrong but isn't 1 degree latitude a bit over 100km? So it wouldn't be extremely unlikely to see points 300 km away being generated. If that's too far then you need to adjust the standard deviation. – Dason Jan 25 '19 at 15:21
  • @Dason. Yes, 1 degree of latitude is approximately 111km. I think you are right that there is nothing wrong with this method. I just had the wrong expectation that most points would fall 'close' to the mean when the distance is calculated; so I was taken aback when looking at the histograms of distances between random points and the true point and seeing so few points close to 0 (the mean). Thank you. – bealhammar Jan 25 '19 at 16:30

0 Answers0