2

I'm working on a 2-dimensional dataset for segmentation (inputs are just latitudes and longitudes) and I want to use K-means but at the same time to be able to specify minimum and maximum number of units in clusters, as well as the maximum distance between cluster center and outermost point in that cluster. Is anyone aware of any such implementation in R or Python?

Scratch
  • 57
  • 1
  • 1
  • 6
  • Depending on what your end goal is, check out the dbscan algorithm. It sounds like you want to find how closely things are located. http://www.sthda.com/english/wiki/wiki.php?id_contents=7940 – NotThatKindODr Jul 17 '20 at 15:09
  • No the end goal is to create clusters that for instance have around 50k-150k points inside and also have predefined maximum distance from the center. Forgot to mention, I already know approximate value of K. – Scratch Jul 17 '20 at 15:38
  • This starts to look more like the set covering problem, as the maximum distance between cluster center and outermost point is the radius of the covers. https://math.mit.edu/~goemans/18434S06/setcover-tamara.pdf The min/max makes it a none-standard problem, but I would read about set covering anyway to get ideas for heuristics – Willem Hendriks Jul 20 '20 at 17:18
  • http://www.iitg.ac.in/psm/qip2015/material/Gauram_K_Das_Lectures.pdf – Willem Hendriks Jul 20 '20 at 19:37

0 Answers0