Questions tagged [dbscan]

DBSCAN means density-based spatial clustering of applications with noise and is a popular density-based cluster analysis algorithm.

It is a density-based clustering algorithm because it finds a number of clusters starting from the estimated density distribution of corresponding nodes. DBSCAN is one of the most common clustering algorithms and also most cited in scientific literature. OPTICS can be seen as a generalization of DBSCAN to multiple ranges, effectively replacing the ε parameter with a maximum search radius.

See also wikipedia.

In scientific software for statistical computing and graphics, package dbscan implements this method.

563 questions
5
votes
1 answer

What clustering algorithm is suitable for 2d rectangles without knowing the number of clusters ahead of time?

The problem I have is that there are rectangles within rectangles. Think of a map, except with the following traits with the key point being: rectangles with similar density often share similar dimensions and similar position on the x axis with…
javastudent
  • 359
  • 1
  • 4
  • 12
5
votes
2 answers

Clustering algorithm appropriate for very small clusters

I am trying to find duplicates in a list of about 5000 records. Each record is a person's name and address, but all typed inconsistently into one field, so I'm trying a fuzzy matching approach. My methodology (using rapidminer) is to do some…
aquavitae
  • 17,414
  • 11
  • 63
  • 106
5
votes
1 answer

Cluster assignments differ sometimes in two DBSCAN implementations

I have implemented the DBSCAN algorithm in R, and i am matching the cluster assignments with the DBSCAN implementation of the fpc library. Testing is done on synthetic data which is generated as given in the fpc library dbscan example: n <- 600 x <-…
phoxis
  • 60,131
  • 14
  • 81
  • 117
4
votes
2 answers

Cluster center mean of DBSCAN in R?

Using dbscan in package fpc I am able to get an output of: dbscan Pts=322 MinPts=20 eps=0.005 0 1 seed 0 233 border 87 2 total 87 235 but I need to find the cluster center (mean of cluster with most seeds). Can anyone show me how to…
Florie
  • 251
  • 3
  • 9
4
votes
3 answers

Graphing results of dbscan in R

Your comments, suggestions, or solutions are/will be greatly appreciated, thank you. I'm using the fpc package in R to do a dbscan analysis of some very dense data (3 sets of 40,000 points between the range -3, 6). I've found some clusters, and I…
4
votes
0 answers

How can I extract the point cloud after applying dbscan?

**I am trying to extraction point cloud after applying DBSCAN algorithm from open3d. I had problem in visualize in spyder so I am using csv file to save the point cloud and open in cloud compare. Need help to save the file in csv format. ** import…
Vims Rocz
  • 57
  • 8
4
votes
0 answers

kernel dies when computing DBSCAN in scikit-learn after dimensionality reduction

I have some data after using ColumnTransformer() like >>> X_trans <197431x6040 sparse matrix of type '' with 3553758 stored elements in Compressed Sparse Row format> I transform the data using TruncatedSVD() which seems to…
4
votes
1 answer

DBSCAN or HDBSCAN is better option? and why?

which clustering method is considered to be the best among DBSCAN and HDBSCAN and what is the reason behind that?
4
votes
2 answers

How to explain text clustering result by feature importance? (DBSCAN)

There are similar questions and libraries like ELI5 and LIME. But I couldn't find a solution to my problem. I have a set of documents and I am trying to cluster them using scikit-learn's DBSCAN. First, I am using TfidfVectorizer to vectorize the…
MehmedB
  • 1,059
  • 1
  • 16
  • 42
4
votes
1 answer

Why is DBSCAN.fit() faster with more features?

I'm playing around with DBSCAN. I'm wondering why the execution time decreases as I increase the number of features (see plot below). I would expect execution time to increase as the number of features increases... import timeit import…
4
votes
2 answers

How to use multiple cores with sklearn dbscan?

I'm trying to process a large volume of data through dbscan and would love to use all cores available to me on the machine to speed up the computation. I'm using a custom distance metric, but the distance matrix is not precomputed. I have tried…
Lauren K
  • 125
  • 6
4
votes
2 answers

Implementation of DBSCAN using R Trees

I am trying to implement DBSCAN using R tree.We can store data in the form of R trees.So my question is how can i store real time data in R trees and how should i implement region query to find neighborhood of a point with it?
rajsekhar
  • 403
  • 1
  • 5
  • 14
4
votes
0 answers

Clustering algorithms: HDBSCAN in R vs HDBSCAN in Python?

For working with exploratory data, which would be best clustering method? Currently I use HDBSCAN. Problem is that the results I get from using HDBSCAN in R is different from results obtained via HDSCBAN in Python. R version: …
4
votes
1 answer

DBSCAN sklearn memory issues

I am trying to use DBSCAN sklearn implementation for anomaly detection. It works fine for small datasets (500 x 6). However, it runs into memory issues when I try to use a large dataset (180000 x 24). Is there something I can do to overcome this…
Nira
  • 133
  • 2
  • 8
4
votes
2 answers

DBSCAN with R*-Tree - how it works

Whether someone can explain to me how dbscan algorithm works with R*-Tree? I understand work of dbscan, it seems, I understand as the R*-Tree works, but I can't connect them together. Initially, I have data - feature vectors with 8 features, and I…
Vladislav
  • 43
  • 6