I am trying to merge two datasets which contain GPS coordinates such that I am left with one dataset with variables from both datasets. I am trying to use a function to achieve this. The problem is that the GPS coordinates from both datasets do not exactly match. So the task is to match the variables of one dataset with the variables of the other dataset by finding the closest pairing of gps coordinates.
I have had success with the fuzzy join package, but was only able to get partial matching (~75%). With the function below, I was hoping to get a higher degree of matching. One dataset is shorter than the other, so the idea here was to use two for loops, with each for-loop going through each dataset.
An "anchor" (distance between the first observations of both datasets) is established, such that if the distance between the two points is less than the anchor, the new (shorter) distance becomes the new anchor. The for-loop continues until the shortest distance is found, and the variables from both datasets are appended to the end of a new dataset, called pairedData here. I should be left with a dataset as long as the shortest dataset used (6314 rows) with data taken from both datasets.
I think the function should work, but rbind() is super slow, and I have been having trouble implementing rbindlist(). Any ideas on how I might achieve this?
combineGPS <- function(harvest,planting) {
require(sp)
require(data.table)
longH <- harvest$long
latH <- harvest$lat
longP <- planting$long
latP <- planting$lat
rowsH <- nrow(harvest)
rowsP <- nrow(planting)
harvestCoords <- cbind(longH,latH)
harvestPoints <- SpatialPoints(harvestCoords)
plantingCoords <- cbind(longP,latP)
plantingPoints <- SpatialPoints(plantingCoords)
#planting data is shorter than harvest data
#need to take each row of planting data (6314) and find closest harvest data point (16626), then attach
anchor <- spDistsN1(plantingPoints[1,],harvestPoints[1,],longlat=FALSE)
pairedData <- data.frame(long=numeric(),
lat=numeric(),
variety=factor(),
seedling_rate=numeric(),
seed_spacing=numeric(),
speed=numeric(),
yield=numeric(),
stringsAsFactors=FALSE)
for (p in 1:rowsP){
for (h in 1:rowsH){
if(spDistsN1(plantingPoints[p,],harvestPoints[h,],longlat=FALSE) <= anchor){
anchor <- spDistsN1(plantingPoints[p,],harvestPoints[h,],longlat=FALSE)
pairedData[p,]<-c(planting[p,]$long, planting[p,]$lat, planting[p,]$variety, planting[p,]$seedling_rate, planting[p,]$seed_spacing, planting[p,]$speed, harvest[h,]$yield)
}
}
}
return(pairedData)
}
doesItWork=combineGPS(harvest,planting)
doesItWork