Questions tagged [dbscan]

DBSCAN means density-based spatial clustering of applications with noise and is a popular density-based cluster analysis algorithm.

It is a density-based clustering algorithm because it finds a number of clusters starting from the estimated density distribution of corresponding nodes. DBSCAN is one of the most common clustering algorithms and also most cited in scientific literature. OPTICS can be seen as a generalization of DBSCAN to multiple ranges, effectively replacing the ε parameter with a maximum search radius.

See also wikipedia.

In scientific software for statistical computing and graphics, package dbscan implements this method.

563 questions
-3
votes
1 answer

Obtain the Clustered Documents of DBSCAN

I attempted to use DBSCAN (from scikit-learn) to cluster text documents. I use TF-IDF (TfidfVectorizer in sklearn) to create the feature of each document. However, I have not found a way to obtain (print) the documents that are clustered by DBSCAN.…
-3
votes
1 answer

DBSCAN in python - Running out of memory

My data has 1 million Lat, Long Coordinate pairs. I am using DBSCAN alorithm with haversine distance measure. However this algorithm runs only for a subset of data 8000 records so far and if I try to run on the entire dataset, running out of memory…
-3
votes
2 answers

DBSCAN Clustering with additional features

Can I apply DBSCAN with other features in addition to location ? and if it is available how can it be done through R or Spark ? I tried preparing an R table of 3 columns one for latitude, longitude and score (the feature I wanna cluster upon in…
Ahmed El-Gamal
  • 180
  • 3
  • 18
-3
votes
1 answer

How to apply DBSCAN algorithm on grouping of similar url

how to group similar url using the DBSCAN algorithm. I have seen many datasets but none were on url , I want to take similar type of urls and group it together. Here i am not able to know distance (eps) and minpoints can be the number of urls to be…
-4
votes
1 answer

suggestion for clustering algorithm?

I have a dataset of 590000 records after preprocessing and i wanted to find clusters out of it and it contains string data (for now assume i have only one column with 590000 unique values in dataset). Also i am using custom defined distance measure…
Vas
  • 918
  • 1
  • 6
  • 19
-4
votes
1 answer

DBSCAN cluster with metric='russellrao'

I met a problem when I use sklearn.cluster.DBSCAN. If I use DBSCAN(metric="russellrao"), which data format should be? I try 2 ways and both return pred = [-1 -1 -1 ..., -1 -1 -1] . You can see the 2 data format below. npy = df2.values y_pred =…
Ao.L
  • 33
  • 1
  • 5
-5
votes
1 answer

Spark causes a memory error when running DBSCAN source using scala. How can we solve this?

We used 100,000 kits. The version of spark is 1.6.1 and scala is 2.1.0. How can I fix memory errors and get good results?
1 2 3
37
38