0

I have made a map with zip codes of a town. The fill is the ratio of being a case or not case. But some zip codes have very few numbers in total so outliers distort the map.

Is there a way to merge the polygons and data of two neighboring areas based on their n automatically?

And if that is not possible, how can I merge rows of my sf/df without losing ID?

And I guess simplest would be just to set the zip codes to NA.

Micha Wiedenmann
  • 19,979
  • 21
  • 92
  • 137
Schillerlocke
  • 305
  • 1
  • 14

1 Answers1

1

Depends on what you mean by "automatically". Here's a simple algorithm.

repeat:
 Find the region with the smallest population.
 If that's more than your threshold, stop
 Find that region's neighbours, pick one (at random, or smallest population).
   merge that neighbour with that region

Finding neighbours and merging can all be done with either the sf package or the sp package and friends (like spdep and rgeos).

Equally, this can be considered a clustering algorithm using a distance metric based on adjacency. You could do a full hierarchical clustering and then cut the tree at a point such that all clusters had N>threshold.

Now as to whether this is a good idea statistically is another question, and depends on what your goal here is. If you are worried about whether an underlying risk is, say > 0.5, and you are getting positives because you have a population of 3 and 2 positives "by chance" from a small sample (of 3), then you need to model your data and work out the probability of exceeding 0.5 given the data. Then map that, which will take into account the small sample size.

Spacedman
  • 92,590
  • 12
  • 140
  • 224