I have geographic (i.e. un-projected) spatial coordinates and would like to generate an n number of random coordinates around each original coordinate pair, such that the generated points follow a bi-variate normal distribution. The mean of the distribution ought to be the position of the original coordinates, with the standard deviation being a radius of r degree latitude/longitude.
I tried to randomly generate coordinates for each axis separately (i.e. lat and lon) using the rnorm
function. Only to later realize that this produces a distribution much wider in all directions than the standard deviation of 1 degree. The following code illustrates this attempt and the plot shows the erroneous output (i.e. a large proportion of points falling far outside the desired standard deviation of 1 degree).
library(sp)
library(scales)
library(geosphere)
library(MASS)
## spatial point of interest (i.e. Observed data).
onepnt <- data.frame(longitude=1, latitude=1)
#### First attempt: separately generate longitude and latitude points based on random distributions with mean of 'onepnt' and st.dev. of 1 degree.
rlon <- rnorm(1000, mean=onepnt$longitude, sd=1) # longitudinal degree error adjusted for latitude
rlat <- rnorm(1000, mean=onepnt$latitude, sd=1) # mean and standard deviation in degrees
rcoords1 <- cbind.data.frame(longitude=rlon, latitude=rlat)
####### Second attempt: generate lat/lons from a bivariate normal distribution
# Set parameters for bivariate normal dist.
rho <- 0 # correlation coefficient
mu1 <- onepnt$longitude # data coordinates as mean (center) of distribution
mu2 <- onepnt$latitude
s1 <- 1 # 1 degree = 1 st.dev.
s2 <- 1
mu <- c(mu1,mu2) # vector of means
sigma <- matrix(c(s1^2, s1*s2*rho, s1*s2*rho, s2^2), 2) # Covariance matrix
# mvrnorm
rlatlon <- as.data.frame(mvrnorm(1000, mu=mu, Sigma=sigma))
colnames(rlatlon) <- c('longitude', 'latitude')
## Assess distribution of randomized points
wgs84 <- CRS('+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs')
###### 1 - separate univariate #####
rtabsp1 <- SpatialPointsDataFrame(rcoords1, proj4string=wgs84, data = rcoords1)
origcoordssp1 <- SpatialPointsDataFrame(onepnt, proj4string=wgs84, data = onepnt)
###### 2 - Bivariate #####
rtabsp2 <- SpatialPointsDataFrame(rcoords1, proj4string=wgs84, data = rcoords2)
plot(rtabsp1, pch=16, col=alpha('black', 0.5))
plot(origcoordssp, pch=20, col='red', cex=2, add=T)
sdbuff1 <- raster::buffer(origcoordssp, 111*1000) ## generate buffer within which 1SD (68%) of generated points ought to fall
plot(sdbuff1, border=4, add=T)
## Check Great Circle distances from the point-of-interest
dist2center1 <- distHaversine(rtabsp1, origcoordssp)/1000
dist2center2 <- distHaversine(rtabsp2, origcoordssp)/1000
hist(dist2center1) ## most points are not near the center (i.e. few points near '0'
hist(dist2center2)
mean(dist2center1) ## means should
mean(dist2center2)
So, the problem I see is that most points generated in these two ways are not near the center point (i.e. not near zero distance from the point-of-interest).
As far as I understand, this happens in the first approach because re-combining lat/long coordinates which are drawn separately produces coordinates from within a square-shaped area, rather than from within a circle-shaped area.
Any help with this would be greatly appreciated! Either a way to simplify the problem (i.e. draw randomly from within buffers of varying radii) or ideally a way to make the bi-variate solution work (I didn't post this to avoid muddying the water).
(NOTE: Ultimately I will be working with a large, global data set, so in order to save on computation time I would prefer not to have to work in projections, if possible. That said, if the root of the issue is projection, then so be it!)