0

I don’t really know any coding apologies in advance, but I work with spatial data. This assignment might be above my head.

Working with 3 datasets:

  • A California severe traffic incidents database (TIMS from UC Berkeley).
  • Another dataset of risk events based on car accidents hitting company assets.
  • 3rd dataset of company assets subject to the risk events.

All datasets have Lat/Long in decimal degrees.

I have 3 main tasks:

  1. Find # of TIMS data points that are within 1/4 mile (or D distance) from other TIMS data points. “TIMS event #1 has 5 other TIMS events within 1/4 mile.”

  2. Find # of TIMS data points that are within 1/4 mile (or D distance) from risk event data points. “Risk event #2 has 7 TIMS events within 1/4 mile.”

  3. Find # of TIMS data points within 1/4 mile (or D distance) of company assets. “Asset # 123456 has 2 risk events within 1/4 mile.”

The goal here is a simple # score.

I have access to ESRI mapping products, but have been asked to perform all the above tasks in the Palantir Foundry environment which is going to need either Python, SQL or R.

Find all nearest neighbors within a specific distance This article maybe answers my question but I am unsure if there is a more simple solution or a more clear answer. https://caam37830.github.io/book/08_geometry/nearestneighbor.html This 2nd link seems like it may be more helpful but an expert opinion from someone who knows what they’re doing would be appreciated.

My guess here is that the same/similar code can be run 3 times.

  • Greetings! Usually it is helpful to provide a minimally reproducible dataset for questions here so people can troubleshoot your problems. One way of doing this is by using the `dput` function. You can find out how to use it here: https://youtu.be/3EID3P1oisg – Shawn Hemelstrand Oct 13 '22 at 23:47

1 Answers1

0

The geosphere package in R has a number of functions that calculate straight-line distances between two points based on longitude and latitude. You can install the package using install.packages('geosphere'). See help(package = 'geosphere') for the list of distance functions - they all start with dist. Some are faster and others more accurately adjust for the curvature of the Earth. For short distances like 1/4 mile you're going to be fine with a faster method.

library(geosphere)

# geosphere functions return meters. If you want miles:
meters_to_miles = 1/1609.334

# Some fake latitude/longitude data. Make sure longitude is column 1 and latitude column 2
incidents = matrix(c(-70.1, -70.2, -73.3,  45 ,43, 46), nrow = 3)

# More fake data
risk_events = matrix(c(-70.05, -70.3, -71.6, -74.5,  44 ,40.2, 45.3, 44.3), nrow = 4)

# Distances between the different incident locations, in miles
# To get what we want we need to loop through the different incidences, to find 
# the distances between each. I'll do that looping here with lapply
incident_distances = lapply(1:nrow(incidents),
                            function(x) distHaversine(incidents[x,], incidents[-x,])*meters_to_miles)

# For each location, are any of the distances to other locations below 1/4 mile?
# Again, loop through the different locations, this time with sapply to produce a vector
below_fourth = sapply(incident_distances, function(x) min(x) < 1/4)
# How many have another incident within a fourth of a mile?
sum(below_fourth)

# Similar code applies in checking one dataset of locations against another
# Except this time we don't have to drop our current location from the other data
risk_distances = lapply(1:nrow(incidents),
                            function(x) distHaversine(incidents[x,], risk_events)*meters_to_miles)
below_fourth = sapply(risk_distances, function(x) min(x) < 1/4)
sum(below_fourth)
NickCHK
  • 1,093
  • 7
  • 17