Questions tagged [dbscan]

DBSCAN means density-based spatial clustering of applications with noise and is a popular density-based cluster analysis algorithm.

It is a density-based clustering algorithm because it finds a number of clusters starting from the estimated density distribution of corresponding nodes. DBSCAN is one of the most common clustering algorithms and also most cited in scientific literature. OPTICS can be seen as a generalization of DBSCAN to multiple ranges, effectively replacing the ε parameter with a maximum search radius.

See also wikipedia.

In scientific software for statistical computing and graphics, package dbscan implements this method.

563 questions
6
votes
4 answers

Why DBSCAN clustering returns single cluster on Movie lens data set?

The Scenario: I'm performing Clustering over Movie Lens Dataset, where I have this Dataset in 2 formats: OLD FORMAT: uid iid rat 941 1 5 941 7 4 941 15 4 941 117 5 941 124 5 941 147 4 941 181 5 941 222 2 941 257 4 941 258 4 941 273 3 941 294…
T3J45
  • 717
  • 3
  • 12
  • 32
6
votes
2 answers

How to estimate eps using knn distance plot in DBSCAN

I have the following code to estimate the eps for DBSCAN. If the code is fine then I have obtained the knn distance plot. The code is : ns = 4 nbrs = NearestNeighbors(n_neighbors=ns).fit(data) distances, indices = nbrs.kneighbors(data) distanceDec =…
6
votes
1 answer

"could not convert integer scalar" error when using DBSCAN

I'm trying to use scikit-learn's DBSCAN implementation to clusterize a bunch of documents. First I create the TF-IDF matrix using scikit-learn's TfidfVectorizer (it's a 163405x13029 sparse matrix of type numpy.float64). Then I try to clusterize…
Parzival
  • 2,004
  • 4
  • 33
  • 47
6
votes
2 answers

DBSCAN error with cosine metric in python

I was trying to use DBSCAN algorithm from scikit-learn library with cosine metric but was stuck with the error. The line of code is db = DBSCAN(eps=1, min_samples=2, metric='cosine').fit(X) where X is a csr_matrix. The error is the following:…
6
votes
1 answer

Clustering GPS data using DBSCAN but clusters are not meaningful (in terms of size)

I am working with GPS data (latitude, longitude). For density based clustering I have used DBSCAN in R. Advantages of DBSCAN in my case: I don't have to predefine numbers of clusters I can calculate a distance matrix (using Haversine…
sau
  • 1,316
  • 4
  • 16
  • 37
5
votes
3 answers

A density based clustering library that takes distance matrix as input

Need help with finding an open/free density based clustering library that takes a distance matrix as input and returns clusters with each element within it maximum "x" distance away from each of the other elements in the clusters (basically…
Atish Kathpal
  • 693
  • 2
  • 7
  • 20
5
votes
2 answers

DBSCAN code in C# or vb.net , for Cluster Analysis

Kindly I need your support to advice a library or a code in vb.net or C#.net that applies the DBSCAN to make Denisty Based Cluster of data . I have a GPS data , and I want to find stay points using the DBSCAN algorithm . But , I do not understand…
waleed.makarem
  • 179
  • 2
  • 9
5
votes
2 answers

Implementing DBSCAN in distributed system

I have a Big Data problem and I have very limited experience with parallel processing and Big data. I have 100s of millions of rows consisting of Latitude and Longitude data and several ID. For each ID I can have data ranging form 10000 -10…
Sam
  • 2,545
  • 8
  • 38
  • 59
5
votes
1 answer

DBSCAN Remove Noise from Plot

Using DBSCAN, (DBSCAN(eps=epsilon, min_samples=10, algorithm='ball_tree', metric='haversine') I have clustered a list of latitude and longitude pairs, for which I then plotted using matplotlib. When plotting, it includes the "noise" coordinates,…
andrewr
  • 784
  • 13
  • 31
5
votes
2 answers

DBSCAN (with metric only) in scikit-learn

I have objects and a distance function, and want to cluster these using DBSCAN method in scikit-learn. My objects don't have a representation in Euclidean space. I know, that it is possible to useprecomputed metric, but in my case it's very…
Sergey Sosnin
  • 1,313
  • 13
  • 30
5
votes
1 answer

ELKI: Running DBSCAN on custom Objects in Java

I'm trying to use ELKI from within JAVA to run DBSCAN. For testing I used a FileBasedDatabaseConnection. Now I would like to run DBSCAN with my custom Objects as parameters. My objects have the following structure: public class MyObject { private…
RBo
  • 53
  • 2
5
votes
2 answers

DBSCAN using spatial and temporal data

I am looking at data points that have lat, lng, and date/time of event. One of the algorithms I came across when looking at clustering algorithms was DBSCAN. While it works ok at clustering lat and lng, my concern is it will fall apart when…
cbake
  • 91
  • 2
  • 5
5
votes
1 answer

interpreting the results of OPTICSxi Clustering

I am interested in detecting clusters in areas with varying-density, such as user-generated data in cities, and for that I adopted the OPTICS algorithm. Unlike DBSCAN, the OPTICS algorithm does not produce a strict cluster partition, but an…
5
votes
1 answer

Choosing and implementing clustering method: DBSCAN something else?

I have a need to cluster a data set of lat,long coordinates. I am using python as my language and plan on using DBSCAN as I don't want to have to specify the # of clusters. The goal and purpose is to be able to input a large data set of lat,long…
5
votes
3 answers

Distance matrix creation using nparray with pdist and squareform

I'm trying to cluster using DBSCAN (scikit learn implementation) and location data. My data is in np array format, but to use DBSCAN with Haversine formula I need to create a distance matrix. I'm getting the following error when I try to do this( a…
TheBaywatchKid
  • 141
  • 1
  • 1
  • 7
1 2
3
37 38