1

I recently started to work with a huge dataset, provided by medical emergency service. I have cca 25.000 spatial points of incidents.

I am searching books and internet for quite some time and am getting more and more confused about what to do and how to do it.

The points are, of course, very clustered. I calculated K, L and G function for it and they confirm serious clustering.

I also have population point dataset - one point for every citizen, that is similarly clustered as incidents dataset (incidents happen to people, so there is a strong link between these two datasets).

I want to compare these two datasets to figure out, if they are similarly distributed. I want to know, if there are places, where there are more incidents, compared to population. In other words, I want to use population dataset to explain intensity and then figure out if the incident dataset corresponds to that intensity. The assumption is, that incidents should appear randomly regarding to population.

I want to get a plot of the region with information where there are more or less incidents than expected if the incidents were randomly happening to people.

How would you do it with R?

Should I use Kest or Kinhom to calculate K function? I read the description, but still don't understand what is a basic difference between them.

I tried using Kcross, but as I figured out, one of two datasets used should be CSR - completely spatial random. I also found Kcross.inhom, should I use that one for my data?

How can I get a plot (image) of incident deviations regarding population?

I hope I asked clearly.

Thank you for your time to read my question and even more thanks if you can answer any of my questions.

Best regards!

Jernej

JerT
  • 21
  • 3

1 Answers1

3

I do not have time to answer all your questions in full, but here are some pointers.

DISCLAIMER: I am a coauthor of the spatstat package and the book Spatial Point Patterns: Methodology and Applications with R so I have a preference for using these (and I genuinely believe these are the best tools for your problem).

Conceptual issue: How big is your study region and does it make sense to treat the points as distributed everywhere in the region or are they confined to be on the road network?

For now I will assume we can assume they are distributed anywhere.

A simple approach would be to estimate the population density using density.ppp and then fit a Poisson model to the incidents with the population density as the intensity using ppm. This would probably be a reasonable null model and if that fits the data well you can basically say that incidents happen "completely at random in space when controlling for the uneven population density". More info density.ppp and ppm are in chapters 6 and 9 of 1, respectively, and of course in the spatstat help files.

If you use summary statistics like the K/L/G/F/J-functions you should always use the inhom versions to take the population density into account. This is covered in chapter 7 of 1.

Also it could probably be interesting to see the relative risk (relrisk) if you combine all your points in to a marked point pattern with two types (background and incidents). See chapter 14 of 1.

Unfortunately, only chapters 3, 7 and 9 of 1 are availble as free to download sample chapters, but I hope you have access to it at your library or have the option of buying it.

Ege Rubak
  • 4,347
  • 1
  • 10
  • 18
  • Thank you for your answer, it helped me a lot:) Should I use kppm instead of ppm? – JerT Aug 04 '16 at 13:22
  • 1
    If you fit an inhomogeneous Poisson model with `ppm` and determine that the data points are more clustered than explained by that model you could indeed use `kppm` (or add `interaction = AreaInter(R)` where `R` is some kind of interaction range -- see chapter 13 of the book). – Ege Rubak Aug 04 '16 at 21:45
  • I am very sorry to bother you, but I have additional question about ppm. I calculated density of my population, but after reading spatstat help over and over, I have no idea, how to use that calculated density. If I use it as covariate and fit the incidents ppp the plot function returns that there nothing to plot - flat surface. is this ok: fit = ppm(inc_ppp, ~dens_pop) – JerT Aug 05 '16 at 19:14
  • 1
    How do you calculate the spatially varying population density? Using `density.ppp`? Please provide a small reproducible example by editing your question, then I have a much better chance of commenting on what you are doing. If you want example data you can use `X <- split(chorley)$larynx` as you incidets data (58 people) and `pop <- split(chorley)$lung` as your background population (978 people). – Ege Rubak Aug 05 '16 at 19:30
  • Yes, I use density.ppp to calculate population density. Let's say I use dens_pop <- density.ppp(split(chorley)$lung) x <- split(chorley)$larynx How do I fit the model to X with help of dens_pop? Loke this: fit = ppm(inc_ppp, ~dens_pop)? How ho I know if fit is ok or not? In other words if incidents happen completely at random in space when controlling for the uneven population density? You can't imagine how thankful I am, to finally find some help on this:) – JerT Aug 07 '16 at 14:30
  • Your model formula is not correct. You need to remember it is on the log-scale. Please look at Section 9.3.7 (remember Chapter 9 is available for free download). After you fitted your model correctly, validation of your fitted model is covered at length in chapter 11 of the book. Did you manage to get your hands on a copy yet? Some keywords: relative intensity; residuals; smoothed residual field; Pearson residuals. – Ege Rubak Aug 08 '16 at 07:35