Clustering function in R constrained to road network

Question

I am analysing STATS19 road accident data, commendably made available to the public by the UK government. I would like to look at how clustered different types of accidents are. The "G function" (described here) can be used to measure the divergence of point patterns from cases of complete spatial randomness "CSR".

spatstat handles this kind of problem well, with the envelope function providing a visualisation for the extent to which the pattern diverges from the CSR for different distances.

As my colleague Dan Olner has pointed out, however, the results (shown below, showing great divergence from the CSR) do not necessarily show clustering - it could be simply that we are detecting the natural clustering of the road network, on which most road accidents occur. The plot below can be reproduced by cloning my GitHub repo and running the following (after running parts of WY.R):

r <- seq(0, sqrt(2)/6, by = 0.005)
acB1 <- elide(acB, scale = TRUE)
# acB1 <- acB1[1:50,] # for tiny subset
acB1 <- SpatialPoints(acB1)
# Calculate the G function for the points
envacB <- envelope(as(acB1, "ppp"), fun = Gest)
# Calculate the G function for the points
plot(envacB)

G function for accidents

This issue is actually described by Adrian Baddeley (developer of spatstat) himself in the package's documentation:

points could be locations in one dimension (such as road accidents recorded on a road network)

This is exactly the situation I am facing but I do not know how to modify the analysis presented above to constraint the CSR to (or better, near to - as not all accidents are precisely on the road - see below) the road network. (see data here).

accidents and the road network

One suggestion was to take random points from the road network and calculate the G function for this and compare it with my accident data, but that would not create a clear (statistically significant) bounding box. Any suggestions?

score 3 · Answer 1 · answered Feb 25 '14 at 22:52

You are absolutely right that the perceived clustering could be due to the accidents occurring on the road network. This must be accounted for. In spatstat the road network is represented by a "linnet" object, so you need to convert your road network to this format. I don't know the details of that, but I would guess you should look at the "shapefiles" vignette in spatstat (you might have to go through the line segment class "psp" to import things):

vignette("shapefiles", package="spatstat")

A point pattern on a linear network is of class "lpp", so this is the data format you need in the end. If you have managed to store your network as the linnet object "mynet" you should be able to do something like:

X <- as(acB1, "ppp")
X <- lpp(X, mynet)

This automatically projects your points onto the network. Now you can look at summary statistics on the network. I don't think the G function is implemented in this setup, but I know the K-function is (function "linearK"), so you could use that. The generic function envelope as you used in your code now calls envelope.lpp which makes sure that the CSR simulations also are generated on the network.

I hope some of this is useful albeit not very detailed. Have a look at the relevant help files in spatstat for more details:

help(lpp)
help(linnet)
help(linearK)

Do report back how you progress from here, then I (or more likely Adrian Baddeley) might be able to give you some more pointers.

Many thanks Ege - network implementations of clustering algorithms are indeed needed. This is the original paper on the linear K-function http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.201.3884&rep=rep1&type=pdf looking forwarding to implementing linearK and linearKcross! — RobinLovelace, Feb 26 '14 at 11:14

Clustering function in R constrained to road network

1 Answers1