Depends on what you mean by "automatically". Here's a simple algorithm.
repeat:
Find the region with the smallest population.
If that's more than your threshold, stop
Find that region's neighbours, pick one (at random, or smallest population).
merge that neighbour with that region
Finding neighbours and merging can all be done with either the sf
package or the sp
package and friends (like spdep
and rgeos
).
Equally, this can be considered a clustering algorithm using a distance metric based on adjacency. You could do a full hierarchical clustering and then cut the tree at a point such that all clusters had N>threshold
.
Now as to whether this is a good idea statistically is another question, and depends on what your goal here is. If you are worried about whether an underlying risk is, say > 0.5, and you are getting positives because you have a population of 3 and 2 positives "by chance" from a small sample (of 3), then you need to model your data and work out the probability of exceeding 0.5 given the data. Then map that, which will take into account the small sample size.