Questions tagged [hdbscan]

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996. It is a density-based clustering algorithm: given a set of points in some space, it groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions.

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996.1 It is a density-based clustering algorithm: given a set of points in some space, it groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions (whose nearest neighbors are too far away). DBSCAN is one of the most common clustering algorithms and also most cited in scientific literature.

In 2014, the algorithm was awarded the test of time award (an award given to algorithms which have received substantial attention in theory and practice) at the leading data mining conference, KDD.

81 questions
0
votes
1 answer

Density and threshold based clustering in dbscan

I am working on some thermal temperature data of industrial parts. I have a pixel-wise temperature of the part with temperature values. I want to use dbscan to identify parts that have clusters of pixels in each part where all points in the cluster…
0
votes
1 answer

Snakemake rule is run only for one file

I have a rule in snakemake that runs HDBSCAN clustering. Previously it was regular DBSCAN and was working fine, but after I modified it, somehow the problem started (I modified also Snakemake file for other reasons, so hard to say what is to blame).…
Nikita Vlasenko
  • 4,004
  • 7
  • 47
  • 87
0
votes
1 answer

Fine tuning hdbscan parameters for clustering text documents

I have text documents which am clustering using hdbsca. When I have laser amount data around 35 documents and correct values of clusters around 14, then using following paramters I am getting correct result. def cluster_texts(textdict,…
user2129623
  • 2,167
  • 3
  • 35
  • 64
-1
votes
1 answer

How to use GloVe to generate vector matrix?

I am using HDBSCAN algorithm to create clusters from the documents I have. But to create a vector matrix from the words, I am using tf-idf algorithm and want to use GloVe. I have searched posts but could not understand how to use this algorithm. I…
Suhail Gupta
  • 22,386
  • 64
  • 200
  • 328
-2
votes
1 answer

HBSCAN membership probability

I'm working on a comparison between clustring algorithms and I want to know how HDBSCAN in R calculate the so called the membership 'probability' ?
-4
votes
1 answer

suggestion for clustering algorithm?

I have a dataset of 590000 records after preprocessing and i wanted to find clusters out of it and it contains string data (for now assume i have only one column with 590000 unique values in dataset). Also i am using custom defined distance measure…
Vas
  • 918
  • 1
  • 6
  • 19
1 2 3 4 5
6