I am analysing STATS19 road accident data, commendably made available to the public by the UK government. I would like to look at how clustered different types of accidents are. The "G function" (described here) can be used to measure the divergence of point patterns from cases of complete spatial randomness "CSR".
spatstat handles this kind of problem well, with the envelope
function providing a visualisation for the extent to which the pattern diverges from the CSR for different distances.
As my colleague Dan Olner has pointed out, however, the results (shown below, showing great divergence from the CSR) do not necessarily show clustering - it could be simply that we are detecting the natural clustering of the road network, on which most road accidents occur. The plot below can be reproduced by cloning my GitHub repo and running the following (after running parts of WY.R):
r <- seq(0, sqrt(2)/6, by = 0.005)
acB1 <- elide(acB, scale = TRUE)
# acB1 <- acB1[1:50,] # for tiny subset
acB1 <- SpatialPoints(acB1)
# Calculate the G function for the points
envacB <- envelope(as(acB1, "ppp"), fun = Gest)
# Calculate the G function for the points
plot(envacB)
This issue is actually described by Adrian Baddeley (developer of spatstat) himself in the package's documentation:
points could be locations in one dimension (such as road accidents recorded on a road network)
This is exactly the situation I am facing but I do not know how to modify the analysis presented above to constraint the CSR to (or better, near to - as not all accidents are precisely on the road - see below) the road network. (see data here).
One suggestion was to take random points from the road network and calculate the G function for this and compare it with my accident data, but that would not create a clear (statistically significant) bounding box. Any suggestions?