-1

I am new to using the DBSCAN algorithm.

Quick summary; it has two parameters:

  1. epsilon - to specify the acceptable "distance" between two points, under which they can be considered close enough to cluster.
  2. minPoints - to specify the minimum number of points that must fall with the distance epsilon to constitute a cluster. If there aren't enough points together, it's just labelled as noise.

I'm using somebody else's DBSCAN algorithm and I have the source code, which I only kind of understand. I was hoping I could use it as-is but then I found some behaviour I wasn't expecting.

I specified a value of 6 for minPoints and yet in my results I got a cluster with only 2 points.

From debugging, I think I can see what's happening. It looks like when the point is examined it has 16 neighbours at a distance below epsilon, so it should qualify as a cluster. Later on, the algorithm sees that 14 of those neighbours had already been assigned to a different cluster, so this cluster ends up with only 2 points in it.

Is minPoints supposed to be strictly enforced?

Is this how a healthy DBSCAN algorithm is supposed to work or is it a bug I need to fix before proceeding?

benjamin
  • 1,364
  • 1
  • 14
  • 26
  • And if it is a bug... what should DBSCAN do with this data-point? Put it into the same cluster that most of its neighbours were found to be? – benjamin Apr 25 '21 at 06:57

1 Answers1

0

For anybody who finds this question through Google, the correct answer is Yes. It is possible for a correctly functioning DBSCAN implementation to create a cluster with less than minPoints members. It is explained more fully on this other Stack Overflow question:

Can the DBSCAN algorithm create a cluster with less than minPts?

benjamin
  • 1,364
  • 1
  • 14
  • 26