directional testing of spatial clustering by distance from source

Question

I have a spatial dataset of animal locations, as (x,y) points around a source (circular pattern with 5 km radius). I need to test whether the points are clustered (or repulsed) around the source relative to farther from the source, while accounting for directionality.
Things I tried:

checked out nearest neighbour and Ripley's K - could not figure out how to incorporate distance from source or directionality (plus, it looks like Ripley's is expecting a rectangular window, whereas mine is circular)
divided data into cardinal directions (N, E, S, W) and distance bins and calculated density of points per distance/direction bin. Then got stuck again.

Ideally, I'd get a result that could tell me "your points are distributed like a doughnut in direction X, are random in direction Y, and are mountain-peak-like in direction Z". I feel like this answer (resampling + mad.test) might be the right direction for me, but can't quite adapt it to my scenario...

Here's a fake dataset of a circular distribution around a point source:

library(ggplot2)
library(spatstat)
library(dplyr)

set.seed(310366)
nclust <- function(x0, y0, radius, n) {
           return(runifdisc(n, radius, centre=c(x0, y0)))
         }

c <- rPoissonCluster(1000, 0.1, nclust, radius=0.02, n=2)

df <- data.frame(x = c$x - 0.5, y = c$y - 0.5) %>%
    mutate(Distance = sqrt(x^2 + y^2)) %>%
    filter(Distance < 0.5)

ggplot(df) + 
    geom_point(aes(x = x, y = y)) +
    # source point
    geom_point(aes(x = 0, y = 0, colour = "Source"), size = 2) +
    coord_fixed()

I've fitted models like this with a directional "plume" parameterised by a direction and an eccentricity, using maximum likelihood. I'm not sure SO is the right place to answer it since its a statistical question rather than a programming one - so try stats.stackexchange.com — Spacedman, Dec 14 '18 at 13:03
@Spacedman It sounds like a more refined approach of my #2 things I tried. Any chance you could provide some code for it? I'm not sure how to refine it without losing too much data (i.e., I feel like the direction should be binned to actually have enough points). Would appreciate any pointers. Since it's a coding / approach refinement to something I tried - hope it becomes more of a programming question... — user2602640, Dec 18 '18 at 10:32

Ege Rubak · Accepted Answer · 2018-12-18T22:39:10.150

Maybe you can get some relevant insight by just studying the anisotropy of the pattern (even though it probably won't give you all the answers you are looking for).

Tools to detect anisotropy in point patterns include: sector K-function, pair orientation distribution, anisotropic pair correlation function. These are all described in Section 7.9 of Spatial Point Patterns: Methodology and Applications with R (dislaimer: I'm a co-author). Luckily Chapter 7 is one of the free sample chapters, so you can download it here: http://book.spatstat.org/sample-chapters.html.

It does not treat your source location in a special way, so it is not solving the entire problem, but it may serve as inspiration when you consider what to do.

You could make a Poisson model with an intensity that depends on the distance and direction from the source and see if that gives you any insight.

Below are some lightly commented code snippets since I don't have time to elaborate (remember these are just rough ideas -- there may be much better alternatives). Feel free to improve.

Uniform points in a unit disc:

library(spatstat)
set.seed(42)
X <- runifdisc(2000)
plot(X)

W <- Window(X)

Polar coordinates as covariates:

rad <- as.im(function(x,y){sqrt(x^2+y^2)}, W)
ang <- as.im(atan2, W)
plot(solist(rad, ang), main = "")

north <- ang < 45/180*pi & ang > -45/180*pi
east <- ang > 45/180*pi & ang < 135/180*pi
west <- ang < -45/180*pi & ang > -135/180*pi
south <- ang< -135/180*pi | ang > 135/180*pi
plot(solist(north, east, west, south), main = "")

plot(solist(rad*north, rad*east, rad*west, rad*south), main = "")

Fit a simple log-linear model (more complicated relations could be fitted with ippm():

mod <- ppm(X ~ rad*west + rad*south +rad*east)
mod
#> Nonstationary Poisson process
#> 
#> Log intensity:  ~rad * west + rad * south + rad * east
#> 
#> Fitted trend coefficients:
#>   (Intercept)           rad      westTRUE     southTRUE      eastTRUE 
#>    6.37408999    0.09752045   -0.23197347    0.18205119    0.03103026 
#>  rad:westTRUE rad:southTRUE  rad:eastTRUE 
#>    0.32480273   -0.29191172    0.09064405 
#> 
#>                  Estimate      S.E.    CI95.lo   CI95.hi Ztest       Zval
#> (Intercept)    6.37408999 0.1285505  6.1221355 6.6260444   *** 49.5843075
#> rad            0.09752045 0.1824012 -0.2599794 0.4550203        0.5346480
#> westTRUE      -0.23197347 0.1955670 -0.6152777 0.1513307       -1.1861588
#> southTRUE      0.18205119 0.1870798 -0.1846184 0.5487208        0.9731206
#> eastTRUE       0.03103026 0.1868560 -0.3352008 0.3972613        0.1660651
#> rad:westTRUE   0.32480273 0.2724648 -0.2092185 0.8588240        1.1920904
#> rad:southTRUE -0.29191172 0.2664309 -0.8141066 0.2302832       -1.0956377
#> rad:eastTRUE   0.09064405 0.2626135 -0.4240690 0.6053571        0.3451614
plot(predict(mod))

Non-uniform model:

lam <- 2000*exp(-2*rad - rad*north - 3*rad*west)
plot(lam)

set.seed(4242)
X2 <- rpoispp(lam)[W]
plot(X2)

Fit:

mod2 <- ppm(X2 ~ rad*west + rad*south +rad*east)
plot(predict(mod2))
plot(X2, add = TRUE, col = rgb(.9,.9,.9,.5))

Add point in centre and look at Ksector() restricted to that point as the reference point (not very informative for this example, but might be helpful in other cases??):

X0 <- ppp(0, 0, window = W)
plot(X2[disc(.1)], main = "Zoom-in of center disc(0.1) of X2")
plot(X0, add = TRUE, col = "red")
dom <- disc(.01)
plot(dom, add = TRUE, border = "blue")

X3 <- superimpose(X2, X0)

The estimated North sector K-function is above West (difference plotted):

Knorth <- Ksector(X3, 45, 135, domain = dom)
Kwest <- Ksector(X3, 135, 225, domain = dom)
plot(eval.fv(Knorth-Kwest), iso~r)

^{Created on 2018-12-18 by the reprex package (v0.2.1)}

could you provide a bit more info on the Poisson model suggestion? (same comment as above) - it sound like a cleaner approach than what I tried in my #2 approach. Is it literally a `glm` with a distance and a circular angle predictor? I'm not sure how I would be accounting for the change in patterns with angle - would a `gam` be a better approach?? — user2602640, Dec 18 '18 at 10:51
Added some quick snippets which you could use for inspiration. Have no idea how useful a `gam` is here. — Ege Rubak, Dec 18 '18 at 22:40
I accepted answer and awarded bounty to make sure it didn't expire. I still have to poke around with the info you provided - not 100% following from a quick read. Thanks for all the pointers - I'm sure I'll have a few follow-up questions! — user2602640, Dec 21 '18 at 00:04
As expected, I'm back with questions. My data have a strong nugget effect at the origin, whereas the model seems to enforce a spatial continuity at the origin (all values predicted at distance = 0 are identical between the four directions). Is there a way to force a strong change between the four directions? — user2602640, Jan 03 '19 at 16:20
There can be only one intensity value exactly at the origin. As soon as you move away from the origin you can have a difference between the four directions. If you look at the fitted model in the answer and the plot of `predict(mod)` you see that in the south direction the intensity close to the origin is ~700 while it is ~500 to the west. The directional indicators `west`, `south` etc. should allow for multiplying the intensity in that region by any factor. There may be an issue with the definitions of directions overlapping due to => vs > and the origin satisfying all of the directions... — Ege Rubak, Jan 04 '19 at 10:43
OK, I'll check out the definitions to see how fast the nugget onset is. One more - once I use predict(mod) with confidence interval, is the interpretation as usual that if the CIs don't overlap, the estimates are significantly different, and if the mean value overlaps another CI, then the estimates are not? Alternatively - is there a way to make `emmeans` (or glht) work with ppm output? I didn't see any examples in the free modeling chapter of the book... — user2602640, Jan 04 '19 at 10:52

directional testing of spatial clustering by distance from source

1 Answers1

Linked