Questions tagged [dbscan]

DBSCAN means density-based spatial clustering of applications with noise and is a popular density-based cluster analysis algorithm.

It is a density-based clustering algorithm because it finds a number of clusters starting from the estimated density distribution of corresponding nodes. DBSCAN is one of the most common clustering algorithms and also most cited in scientific literature. OPTICS can be seen as a generalization of DBSCAN to multiple ranges, effectively replacing the ε parameter with a maximum search radius.

See also wikipedia.

In scientific software for statistical computing and graphics, package dbscan implements this method.

563 questions
9
votes
2 answers

Clustering geospatial data on coordinates AND non spatial feature

Say i have the following dataframe stored as a variable called coordinates, where the first few rows look like: business_lat business_lng business_rating 0 19.111841 72.910729 5. 1 19.111342 72.908387 5. 2 …
9
votes
1 answer

DBSCAN with python and scikit-learn: What exactly are the integer labes returned by make_blobs?

I'm trying to comprehend the example for the DBSCAN algorithm implemented by scikit (http://scikit-learn.org/0.13/auto_examples/cluster/plot_dbscan.html). I changed the line X, labels_true = make_blobs(n_samples=750, centers=centers,…
otmezger
  • 10,410
  • 21
  • 64
  • 90
8
votes
1 answer

What are noisy samples in Scikit's DBSCAN clustering algorithm?

If I apply Scikit's DBSCAN (http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html) on a similarity matrix, I get a series of labels back. Some of these labels are -1. The documentation calls them noisy samples. What are…
Auxiliary
  • 2,687
  • 5
  • 37
  • 59
8
votes
4 answers

how to plot a k-distance graph in python

How do I plot (in python) the distance graph for a given value of min-points in DBSCAN??? I am looking for the knee and corresponding epsilon value. In the sklearn I do not see any method that return such distances.... Am I missing something?
Mauro Gentile
  • 1,463
  • 6
  • 26
  • 37
8
votes
2 answers

Image not segmenting properly using DBSCAN

I am trying to use DBSCAN from scikitlearn to segment an image based on color. The results I'm getting are . As you can see there are 3 clusters. My goal is to separate the buoys in the picture into different clusters. But obviously they are…
Ryan Fatt
  • 91
  • 1
  • 1
  • 6
8
votes
2 answers

Clustering using a custom distance metric for lat/long pairs

I'm trying to specify a custom clustering function for the scikit-learn DBSCAN implementation: def geodistance(latLngA, latLngB): print latLngA, latLngB return vincenty(latLngA, latLngB).miles cluster_labels = DBSCAN( eps=500, …
Nathan Breit
  • 1,661
  • 13
  • 33
8
votes
1 answer

How to scale input DBSCAN in scikit-learn

Should the input to sklearn.clustering.DBSCAN be pre-processeed? In the example http://scikit-learn.org/stable/auto_examples/cluster/plot_dbscan.html#example-cluster-plot-dbscan-py the distances between the input samples X are calculated and…
Alex
  • 267
  • 1
  • 2
  • 7
7
votes
1 answer

Why k-means in scikit learn have a predict function but DBSCAN/agglomerative doesnt?

Scikit-learn implementation of K-means has a predict() function which can be applied on unseen data. Where as DBSCAN and Agglomerative does not have a predict() function. All the three algorithms has fit_predict() which is used to fit the model and…
7
votes
1 answer

Precomputed distance matrix in DBSCAN

Reading around, I find it is possible to pass a precomputed distance matrix into SKLearn DBSCAN. Unfortunately, I don't know how to pass it for calculation. Say I have a 1D array with 100 elements, with just the names of the nodes. Then I have a 2D…
Jaime Nebrera
  • 79
  • 1
  • 2
7
votes
2 answers

Why are all labels_ are -1? Generated by DBSCAN in Python

![enter image description here][1] from sklearn.cluster import DBSCAN dbscan = DBSCAN(eps=0.001, min_samples=10) clustering = dbscan.fit(X) Example vectors: array([[ 0.05811029, -1.089355 , -1.9143777 , ..., 1.235167 , -0.6473859 , …
Jing
  • 89
  • 1
  • 4
7
votes
0 answers

Monitor progress of scikit's DBSCAN

I have a large amount of data that I want to cluster with Scikit's DBSCAN. I do it with the following line: dbscanObject = DBSCAN(eps=20, min_samples=15).fit(featureVectors) Unfortunately, this takes very long depending on how large the dataset is,…
PlsWork
  • 1,958
  • 1
  • 19
  • 31
7
votes
2 answers

Get the cluster size in sklearn in python

I am using sklearn DBSCAN to cluster my data as follows. #Apply DBSCAN (sims == my data as list of lists) db1 = DBSCAN(min_samples=1, metric='precomputed').fit(sims) db1_labels = db1.labels_ db1n_clusters_ = len(set(db1_labels)) - (1 if -1 in…
user8510273
7
votes
2 answers

DBSCAN for clustering data by location and density

I'm using the method dbscan::dbscan in order to cluster my data by location and density. My data looks like this: str(data) 'data.frame': 4872 obs. of 3 variables: $ price : num ... $ lat : num ... $ lng : num ... Now I'm using…
Paul
  • 1,325
  • 2
  • 19
  • 41
7
votes
1 answer

Cluster high dimensional data with python and DBSCAN

I have a dataset with 1000 dimensions and I am trying to cluster the data with DBSCAN in Python. I have a hard time understanding what metric to choose and why. Can someone explain this? And how should I decide what values to set eps to? I am…
Ekgren
  • 1,024
  • 1
  • 9
  • 13
7
votes
2 answers

ELKI implementation of OPTICS clustering algorithm detects only one cluster

I'm having issue with using OPTICS implementation in ELKI environment. I have used the same data for DBSCAN implementation and it worked like a charm. Probably I'm missing something with parameters but I can't figure it out, everything seems to be…
1
2
3
37 38