I think distVincentyEllipsoid(p1, p2, ...)
tries to find the distance between the first point of p1
with the first point of p2
, second of p1
with second of p2
, etc. What you need is an expansion along the lines of *"first in p1 against all of p2
, second in p1
with all of p2
, etc).
Adjusting your code to call dfSites
and dfStation
(instead of df2
/weather
), the following should work for you. (I'm going to remove one of the stations with dfStation[-1,...]
just for clearly recognizing which of the dimensions represents sites versus stations.
alldists <- sapply(seq_len(nrow(dfSites)), function(i) {
distVincentyEllipsoid(dfSites[i,c("siteLon","siteLat")],
dfStation[-1,c("stationLon","stationLat")])
})
alldists
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
# [1,] 786180.9 123505.1 228960.0 786180.9 123505.1 228960.0 786180.9 123505.1
# [2,] 481351.6 269760.4 122086.2 481351.6 269760.4 122086.2 481351.6 269760.4
# [3,] 119427.7 565573.7 484015.5 119427.7 565573.7 484015.5 119427.7 565573.7
# [4,] 786180.9 123505.1 228960.0 786180.9 123505.1 228960.0 786180.9 123505.1
# [5,] 481351.6 269760.4 122086.2 481351.6 269760.4 122086.2 481351.6 269760.4
# [6,] 119427.7 565573.7 484015.5 119427.7 565573.7 484015.5 119427.7 565573.7
# [7,] 786180.9 123505.1 228960.0 786180.9 123505.1 228960.0 786180.9 123505.1
# [8,] 481351.6 269760.4 122086.2 481351.6 269760.4 122086.2 481351.6 269760.4
# [9,] 119427.7 565573.7 484015.5 119427.7 565573.7 484015.5 119427.7 565573.7
# [10,] 786180.9 123505.1 228960.0 786180.9 123505.1 228960.0 786180.9 123505.1
# [11,] 481351.6 269760.4 122086.2 481351.6 269760.4 122086.2 481351.6 269760.4
# [12,] 119427.7 565573.7 484015.5 119427.7 565573.7 484015.5 119427.7 565573.7
# [13,] 786180.9 123505.1 228960.0 786180.9 123505.1 228960.0 786180.9 123505.1
# [14,] 481351.6 269760.4 122086.2 481351.6 269760.4 122086.2 481351.6 269760.4
# [,9] [,10] [,11] [,12] [,13] [,14] [,15]
# [1,] 228960.0 786180.9 123505.1 228960.0 786180.9 123505.1 228960.0
# [2,] 122086.2 481351.6 269760.4 122086.2 481351.6 269760.4 122086.2
# [3,] 484015.5 119427.7 565573.7 484015.5 119427.7 565573.7 484015.5
# [4,] 228960.0 786180.9 123505.1 228960.0 786180.9 123505.1 228960.0
# [5,] 122086.2 481351.6 269760.4 122086.2 481351.6 269760.4 122086.2
# [6,] 484015.5 119427.7 565573.7 484015.5 119427.7 565573.7 484015.5
# [7,] 228960.0 786180.9 123505.1 228960.0 786180.9 123505.1 228960.0
# [8,] 122086.2 481351.6 269760.4 122086.2 481351.6 269760.4 122086.2
# [9,] 484015.5 119427.7 565573.7 484015.5 119427.7 565573.7 484015.5
# [10,] 228960.0 786180.9 123505.1 228960.0 786180.9 123505.1 228960.0
# [11,] 122086.2 481351.6 269760.4 122086.2 481351.6 269760.4 122086.2
# [12,] 484015.5 119427.7 565573.7 484015.5 119427.7 565573.7 484015.5
# [13,] 228960.0 786180.9 123505.1 228960.0 786180.9 123505.1 228960.0
# [14,] 122086.2 481351.6 269760.4 122086.2 481351.6 269760.4 122086.2
(Because we have 14 rows, each row is one of your stations. You should not do the [-1,]
indexing, just know which is the row/column.) From this, we know that the different between site A
and station b
is 481351.6 meters (first column, second row).
From here, just find the column-minimum:
apply(alldists, 2, which.min)
# [1] 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2
suggesting that the closest station to site A
is b
(which.min
will return the first minimum, it does not indicate ties).
Now, dfStation[apply(alldists, 2, which.min),]
gives you 15 rows of station data that can easily be cbind
ed or otherwise combined with dfSites
.
dplyr
options:
dfSites %>%
mutate(
station_i = purrr::map2_int(
siteLat, siteLon,
~ which.min(geosphere::distVincentyEllipsoid(
cbind(.x,.y), dfStation[-1,c("stationLon","stationLat")]))
),
station = as.character(dfStation$station)[ station_i ]
)
# site siteLat siteLon siteDate station_i station
# 1 A 41 68 1 3 c
# 2 B 42 62 2 1 a
# 3 C 44 63 3 2 b
# 4 A 41 68 4 3 c
# 5 B 42 62 5 1 a
# 6 C 44 63 1 2 b
# 7 A 41 68 2 3 c
# 8 B 42 62 3 1 a
# 9 C 44 63 4 2 b
# 10 A 41 68 5 3 c
# 11 B 42 62 1 1 a
# 12 C 44 63 2 2 b
# 13 A 41 68 3 3 c
# 14 B 42 62 4 1 a
# 15 C 44 63 5 2 b
A slight (10-15%) speed improvement can be seen by doing an outer-product of them.
outer(seq_len(nrow(dfSites)), seq_len(nrow(dfStation)),
function(i,j) geosphere::distVincentyEllipsoid(dfSites[i,2:3], dfStation[j,2:3]))
That also returns an m
xn
matrix (station rows) that you then apply(...)
across to get the closest index. (I was hoping for a larger performance gain, since distVincentyEllipsoid
is called only once ...)