Questions tagged [hdbscan]

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996. It is a density-based clustering algorithm: given a set of points in some space, it groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions.

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996.1 It is a density-based clustering algorithm: given a set of points in some space, it groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions (whose nearest neighbors are too far away). DBSCAN is one of the most common clustering algorithms and also most cited in scientific literature.

In 2014, the algorithm was awarded the test of time award (an award given to algorithms which have received substantial attention in theory and practice) at the leading data mining conference, KDD.

81 questions

vote

0 answers

Measuring "single strongest peak" in a distribution

I'd like to automatically detect whether data have a very strongly discernable peak, with any particular distribution. The data can otherwise be quite noisy, or there might be several 'false' peaks. Here are a few examples of the performance I'd…

asked Apr 20 '21 at 22:48

L Fischman

vote

1 answer

HDBSCAN Cluster choice

I have been working with HDBSCAN and have a few hundreds of clusters based on my data. I am trying to select some cluster groups for further analysis. Looking for the clusters which have high inter-cluster-distance, as in more spread out and behave…

scikit-learn cluster-analysis unsupervised-learning hdbscan

asked Nov 16 '20 at 10:52

Jazz

vote

1 answer

Python HDBScan class always fails on second iteration before even entering first function

I am attempting to look at conglomerated outlier information, utilizing several different SKLearn, HDBScan, and custom outlier detection classes. However, for some reason I am consistently running into an error where any class utilizing HDBScan…

python class scikit-learn hdbscan

asked Jun 20 '20 at 00:02

WolVes

1,286
2
19
39

vote

1 answer

Anomalies Detection by DBSCAN

I am using DBSCAN on my training datatset in order to find outliers and remove those outliers from the dataset before training model. I am using DBSCAN on my train rows 7697 with 8 columns.Here is my code from sklearn.cluster import DBSCAN X =…

machine-learning cluster-analysis outliers anomaly-detection hdbscan

asked May 11 '20 at 12:48

user172500

vote

1 answer

How to extract clusters from HDBSCAN algorithm

I'd like to extract original points that form each cluster, I know that HDBSCAN doesn't have cluster centers , so I thought in case each label corresponds to the original point at the same order, I can do the following but the results are really bad…

python cluster-analysis hdbscan

asked May 06 '20 at 18:51

user11936452

vote

2 answers

Cluster a list of geographic points by distance and constraints

I have a delivery app, and I want to group orders (each order has a lat and lng coordinates) by location proximity (linear distance) and constraints like max orders and max total products (each order has an amount of products) inside a group. For…

python cluster-analysis latitude-longitude hdbscan

asked Apr 07 '20 at 04:07

Alex

1,033
4
23
43

vote

0 answers

How to find top terms in dbscan or hdbscan clusters?

I'm using dbscan from sklearn and HDBSCAN to cluster some documents. vectorizer = TfidfVectorizer(stop_words=mystopwords) X = vectorizer.fit_transform(y) dbscan = DBSCAN(eps=0.75, min_samples = 9) clusters = dbscan.fit_predict(X) Now how can I get…

cluster-analysis k-means dbscan hdbscan

asked Mar 16 '20 at 10:42

user3400567

vote

2 answers

dealing with noise in hdbscan

I have been testing hdbscan from the scikit learn package with a small instance of (x,y) points "point_coord" and the resulting clusters do not really make sense to me. Given the small size of the sample, I am allowing a single cluster. I would…

noise hdbscan

asked Aug 13 '19 at 13:14

Mike

vote

0 answers

Printing a Python-generated plot in R

I am working on performing a HDBSCAN, and am performing the analysis using the hdbscan python module within R. I have the following code: library(reticulate) hdb <- import("hdbscan") # Import hdbscan Python library # Create dummy data. My actual…

python r reticulate hdbscan

asked Apr 02 '19 at 10:46

kneijenhuijs

1,189
1
12
21

vote

1 answer

How to know to which matrix row corresponds each cluster label?

After doing clustering I end up with an object which stores all the cluster labels, something like this: clusterer.labels_ The above is typically a list or an array. Then I always assign the labels to the original pandas dataframe (dataset) like…

python pandas scikit-learn hdbscan

asked Jul 07 '18 at 18:12

tumbleweed

4,624
12
50
81

votes

0 answers

Error with UMAP: "ufunc 'correct_alternative_cosine' did not contain a loop with signature matching types numpy.dtype[float32]"

I'm trying to use UMAP for dimensionality reduction on some embeddings. However, I encounter the following error when my dataset has more than 5k rows: ufunc 'correct_alternative_cosine' did not contain a loop with signature matching types…

python pandas numpy hdbscan

asked Aug 12 '23 at 09:52

Harvindar Singh Garcha

votes

0 answers

scikit-learn HDBscan throws error when trying to compute medoids/centroids

I have a precomputed distance matrix that I want to find the medoids for. According to the scikit-learn docs, there's a parameter and attribute that you have to set and call in order to retrieve these medoids. When I set the parameter…

python scikit-learn centroid hdbscan

asked Aug 04 '23 at 18:38

sadboy_hdbscan

votes

0 answers

Top2Vec model returning TypeError: 'numpy.float64' object cannot be interpreted as an integer

I'm trying to train a top2vec model and come up against either the issue of not having enough documents which I rectify by concatenating the dataframe with itself etc. Then upon training the model the Type Error comes up. I can't find where the…

python hdbscan top2vec

asked Jul 25 '23 at 19:52

Magnetar

votes

1 answer

HDBSCAN doesn't work anymore - 'float' object cannot be interpreted as an integer

I'm running HDBSCAN for weeks now on gene expression datasets and everything went perfectly well, but lately it refuses to run : clusterer = hdbscan.HDBSCAN(min_cluster_size=10, min_samples=1).fit(df) TypeError: 'float' object cannot…

python jupyter-notebook typeerror hdbscan

asked Jul 25 '23 at 10:27

Nozelar

votes

0 answers

HDBSCAN clusters sentence embeddings in one cluster that are way too far apart

I have the task to cluster utterances to a chatbot based on sentence similarity in order to find out which are topics users ask about and how important those topics are. I am converting the utterances into sentence embeddings using the…

cluster-analysis sentence-similarity hdbscan runumap openaiembeddings

asked Jul 13 '23 at 07:10

FelixLangeCoach

Prev 1 2

4 5 6 Next