0

Edit: I tried the following nested loop but I keep getting errors:

install.packages("hutilscpp")
library(hutilscpp)

# Initialize an empty list to store the matching results
matching_results <- list()

# Nested for loop to match each point in dataset1 with dataset2
for (i in 1:nrow(dataset)) {
  min_distance <- Inf
  matching_row <- NULL
  
  for (j in 1:nrow(dataset2)) {
    distance <- match_nrst_haversine(dataset[i, c(lat lon)], dataset2[j, c(lat2 lon2)])
    
    # Update the minimum distance and matching row if a closer point is found
    if (distance < min_distance) {
      min_distance <- distance
      matching_row <- dataset2[j, ]
    }
  }
  
  matching_results[[i]] <- cbind(dataset[i, ], matching_row)
}

I get the following error:

Error: unexpected symbol in:
"  for (j in 1:nrow(dataset2)) {
    distance <- match_nrst_haversine(dataset[i, c(lat lon"

I tried different syntax but nothing has worked. Thanks again.


Original question:

I have two datasets, one is a household survey with geolocations and the other is a climate dataset. I took a screenshot to illustrate, the points are the households, that fall within the climate data grid.

As you can see, the latitudes and longitudes are not the same. How can I merge them in R while keeping the original size of the household dataset (so with duplicates)?

My household dataset looks like this:

lat         lon         year    hhid    indiv
5.535456    7.531536    2010    10001   4
5.535456    7.531536    2010    10001   5
5.535456    7.531536    2010    10001   2
5.535456    7.531536    2010    10001   1
5.535456    7.531536    2010    10001   6
5.535456    7.531536    2010    10001   7
5.535456    7.531536    2010    10001   3


And here is the climate data: 


| lat      | lon      |SPEI
| -------- | -------- |-------
| 4.25     | 13. 25   |1.14703
| 4.75     | 13. 25   |0.961421



The final dates would look like this: 

lat         lon         year    hhid    indiv  SPEI
5.535456    7.531536    2010    10001   4      1.14703
5.535456    7.531536    2010    10001   5      1.14703
5.535456    7.531536    2010    10001   2      1.14703
5.535456    7.531536    2010    10001   1      1.14703





AS1
  • 1
  • 3

2 Answers2

0

The most precise way is to use some form of triangulation to find the most proximal point from your original dataframe, there are many ways to to it, my choice would be using the package hutilscpp

Probably u iterate through the dataframe you are trying to match to your existing points one at a time using:

require(hutilscpp)
match_nrst_haversine(
      lat,
      lon,
      lat2,
      lon2)

You could use a nested for loop to evaluate each point against all other points, keeping the point with the minimum distance.

sconfluentus
  • 4,693
  • 1
  • 21
  • 40
0

We could use fuzzyjoin::geo_left_join to get all the matches within a max distance (1000 miles here) and then pick the closest match for each location.

library(dplyr)
hh |>
  fuzzyjoin::geo_left_join(climate, max_dist = 1000, distance_col = "dist") |>
  slice_min(dist, by = c(lat.x, lon.x))
Jon Spring
  • 55,165
  • 4
  • 35
  • 53