Questions tagged [hdbscan]

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996. It is a density-based clustering algorithm: given a set of points in some space, it groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions.

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996.1 It is a density-based clustering algorithm: given a set of points in some space, it groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions (whose nearest neighbors are too far away). DBSCAN is one of the most common clustering algorithms and also most cited in scientific literature.

In 2014, the algorithm was awarded the test of time award (an award given to algorithms which have received substantial attention in theory and practice) at the leading data mining conference, KDD.

81 questions
0
votes
0 answers

HDBScan Random Search Finetuning

Context I am trying to finetuning my hdbscan algorithm from the hdbscan python library using sklearn RandomizedSearchCV. However I am facing the following error : scores = scorer(estimator, X_test) ^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError:…
Mayow
  • 1
  • 1
0
votes
1 answer

How to import hdbscan in VScode (anaconda installed)

Based on existing information, I've successfully installed HDBSCAN package in my conda virtual environment using conda install -c conda-forge hdbscan However, when I try to run this code import hdbscan It says: (p.s. before that I typed in…
Gillian
  • 45
  • 5
0
votes
0 answers

HDBSCAN to cluster locations for a vehicle routing problem

I am looking for resources on tuning HDBSCAN for vehicle routing problems for multiple depots. In this case, the points being clustered are locations with latitude and longitude. I'd like to identify at least K clusters (being the number or depots).…
0
votes
1 answer

Lower DBCV Scores for Cluster Analysis using Sklearn's GridSearchCV

I have a geographic dataset 'coordinates' in UTM coordinates that I am performing HDBSCAN on and would like to have sklearn's GridSearchCV validate various parameters using DBCV. While manually evaluating the parameters for HDBSCAN I got the…
MJM
  • 123
  • 8
0
votes
0 answers

BERTopic and HDBSCAN: ModuleNotFoundError: No module named 'loky', error while finding module specification for 'loky.backend.popen_loky_posix'

I am running a BERTopic model on tweets, I have 140k tweets to analyze. So far, if I run this on more than 15k lines, I get the following below. I have Joblib version: 1.2.0 and Loky version: 3.3.0 installed, and I'm using miniconda and Python 3.9.…
TSD
  • 31
  • 1
  • 4
0
votes
1 answer

Clustering text. Chatintets library Python. HBDSCAN, UMAP

I'm using chatintents (https://github.com/dborrelli/chat-intents) for automatically clustering. To embed sentences I use sentence transformers. The problem is when I set the maximum and minimum number of clusters and then run, the number of clusters…
0
votes
0 answers

Clustering suggestions: I have unlabelled dataset6 attributes(all numeric) and 100k datapoints. I want to do cluster similar datapoints

As part of preprocessing: I have removed attributes that are high in correlation(>0.8). standardized the data(Standard Scalar) `#To reduce it to lower dimensions I used umap…
0
votes
0 answers

ValueError: cannot be used to seed a numpy.random.RandomState instance

I am trying to perform the automatic clustering with the UMAP. I am using the r wrapper function of UMAP, with all the requirements satisfied but unfortunately I cannot set the seed into the umapr function. i tried to run the code: `hspace =…
0
votes
0 answers

How to manual import hdbscan

I am having a hard time to manual importing hdbscan. For some professional reasons I can't install it via pip. But I'd like to manually import it by from its package file downloaded from pypy.org . I set the path to the file location : import…
Artashes
  • 102
  • 1
  • 9
0
votes
1 answer

clustering for a single timeseries

I have a single array numpy array(x) and i want to cluster it in unsupervised way using DBSCAN and hierarchial clustering using scikitlearn. Is the clustering possible for single array data? Additionally i need to plot the clusters and its…
pro
  • 113
  • 8
0
votes
0 answers

HDBSCAN = Automatical parameter selection for robust clustering of different inputs in Python

I am trying to use HDBSCAN (Hamming metric) for clustering unlabeled categorical (binary) data. I would like to implement it in my code which provides HDBSCAN with the input dataframe of different shape every time it runs based on some external…
Mr.Slow
  • 490
  • 1
  • 1
  • 16
0
votes
0 answers

Alternative to Silhouette scores to avoid incorrect conclusions

I am using silhouette scores as a post-hoc measure of cluster validity for clusters derived from DBSCAN, but the metric fails to accurately capture what is happening in a particular situation that occurs in my data, and I am looking for…
SamPassmore
  • 1,221
  • 1
  • 12
  • 32
0
votes
1 answer

Normalizing Topic Vectors in Top2vec

I am trying to understand how Top2Vec works. I have some questions about the code that I could not find an answer for in the paper. A summary of what the algorithm does is that it: embeds words and vectors in the same semantic space and normalizes…
Ahmed Elashry
  • 389
  • 2
  • 12
0
votes
1 answer

how can i test hdbscan using rapids without getting error

Good morning, i want to test the hdbscan (Hierarchical Density-Based Spatial Clustering of Applications w/ Noise)using GPU so i should use the framework rapids. When i tried to follow the steps described here…
aydi
  • 11
  • 2
0
votes
2 answers

Trouble installing hdbscan package for python : "no module named 'hdbscan'" error

I want to run an algorithm written in Python on my Ubuntu virtual machine. It needs to import the hdbscan module. I thus want to install it on my virtual machine. Following the documentationfrom Pypi.org about this library, I simply ran : pip…
Lalastro
  • 151
  • 2
  • 10