Questions tagged [hdbscan]

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996. It is a density-based clustering algorithm: given a set of points in some space, it groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions.

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996.1 It is a density-based clustering algorithm: given a set of points in some space, it groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions (whose nearest neighbors are too far away). DBSCAN is one of the most common clustering algorithms and also most cited in scientific literature.

In 2014, the algorithm was awarded the test of time award (an award given to algorithms which have received substantial attention in theory and practice) at the leading data mining conference, KDD.

81 questions

votes

2 answers

HDBSCAN: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

I try to inititialize HDBSCAN for clustering in JupytherLab. I use Python 3.7.6.. import numpy as np import pandas as pd from sklearn.datasets import load_digits from sklearn.manifold import TSNE import hdbscan There always always appears the…

asked Aug 31 '21 at 15:26

Philipp

votes

0 answers

Clustering similar lines with HDBSCAN

The image above is a frame from a video. The ultimate goal is to detect the gate. What I want to do is cluster lines similarly to the circles, where the lines that are not circled are outliers. My findings tells me this is a HDBSCAN problem so I…

python opencv line-segment hdbscan

asked Apr 30 '21 at 19:09

Luka Jozić

votes

1 answer

Can we refit or fit in in parts clustering algorithms?

I want to cluster big data set (more than 1M records). I want to use dbscan or hdbscan algorithms for this clustering task. When I try to use one of those algorithms, I'm getting memory error. Is there a way to fit big data set in parts ? (go…

scikit-learn hierarchical-clustering unsupervised-learning dbscan hdbscan

asked Apr 07 '21 at 08:35

Boom

1,145
18
44

votes

1 answer

hdbscan error when inside rapids container

I am using rapids UMAP in conjunction with HDBSCAN inside a rapidsai docker container : rapidsai/rapidsai-core:0.18-cuda11.0-runtime-ubuntu18.04-py3.7 import cudf import cupy from cuml.manifold import UMAP import hdbscan from sklearn.datasets…

cupy rapids cudf hdbscan

asked Mar 12 '21 at 21:38

Igna

1,078
8
18

votes

1 answer

HDBSCAN Shouldn't any object in a cluster have a probability value > 0? And producing inconsistent results

I am using hdbscan to find clusters within a dataset in a Python Jupyter notebook. import pandas as pandas import numpy as np data = pandas.read_csv('data.csv') That data looks something like this: import hdbscan clusterSize = 6 clusterer =…

python jupyter-notebook hdbscan

asked Nov 05 '20 at 01:17

Glen Pierce

4,401
5
31
50

votes

1 answer

How can I cluster 5 dimensional data using HDBSCAN

I am trying to cluster NTU-RGB+D 120 skeleton dataset using HDBSCAN. The numpy array of the skeleton data has 5 dimention **dataset.shape=[40091, 3, 300, 25, 2]** where No of data = 40091, Coordinates = 3 (x-y-z), No of frame = 300, No of joints =…

hdbscan

asked Sep 30 '20 at 15:14

Lp81194

votes

2 answers

HDBSCAN cluster caching and persistance

HDBSCAN has a flag to cache its cluster data as a param like mentioned below: prediction_data :boolean, optional Whether to generate extra cached data for predicting labels or membership vectors few new unseen points later. If you wish to persist…

python scipy pickle dbscan hdbscan

asked Jul 02 '20 at 20:58

Shan

votes

1 answer

The same results in DBSCAN and HDBSCAN?

DBSCAN(epsilon, minPts = 2) is related to single linakge clustering and HDBSCAN(minPts = 2) is also related to single linkage clustering. My question is that: how I can obtain the same clustering results with these settings? Or need to set other…

hierarchical-clustering dbscan hdbscan

asked Jun 15 '20 at 11:12

run2you

votes

1 answer

Not able to predict the cluster membership of a new point under hdbscan function available under "dbscan" package

I am using hdbscan function under the package called "dbscan" to perform clustering on a data. I am not able to predict the membership of a new data point after the cluster is built. The predict function works for the object built under dbscan…

cluster-analysis prediction predict hierarchical-clustering hdbscan

asked Jan 28 '20 at 07:29

Shouvik Sardar

votes

1 answer

Using callable metric for HDBSCAN*

I want to cluster some data with HDBSCAN*. The distance is calculated as a function of some parameters from both values so if the data look like: label1 | label2 | label3 0 32 18.5 3 1 34.5 11 12 2 .. .. …

python metrics hierarchical-clustering dbscan hdbscan

asked Dec 16 '19 at 06:29

Roy Ancri

votes

1 answer

Retrieving members of a cluster with HDBSCAN

So I have some string data that I do some manipulations to and then create a cluster with using HDBSCAN: textData = train['eudexHash'].apply(lambda x: str(x)) clusterer = hdbscan.HDBSCAN(min_cluster_size=5, …

python machine-learning cluster-analysis k-means hdbscan

asked Nov 19 '19 at 16:20

DavimusPrime

votes

0 answers

Bizarre HDBScan clustering result for cosine-similarity matrix

I'm trying to cluster similar messages within machine log files (where e.g. I can't ignore numbers). Debugging my code with a subset of messages which all have the same "degree of similarity" I came across a very strange finding: below a certain…

python cosine-similarity hdbscan

asked Sep 19 '19 at 06:19

MarkH

votes

1 answer

How to reconstruct an image after clustering with hdbscan?

I am trying to reconstruct a brain tumor image after clustering using hdbscan. However, hdbscan does not have cluster centers unlike kmeans so I am a bit confused on how to obtain the clustered image. I have tried obtaining the ref cluster center…

python cluster-analysis medical hdbscan

asked Aug 13 '19 at 01:53

an305692

votes

2 answers

How to visualise top terms on each HDBSCAN cluster

I'm currently trying to use HDBSCAN to cluster a bunch of movie data, in order to group similar content together and be able to come up with 'topics' that describe those clusters. I'm interested in HDBSCAN because I'm aware that it's considered soft…

python cluster-computing topic-modeling hdbscan

asked Aug 01 '19 at 09:06

J.Doe

votes

2 answers

how to print output results in HDBSCAN

I have ASCII data and i need to cluster the data using HDBSCAN. I got the lables but i don't know how to print the output cluster results i.e unique and segregated results from hdbscan. snippet: import hdbscan import numpy as np datafile =…

hdbscan

asked Apr 10 '19 at 10:02

vasu

Prev 1 2 3 4

6 Next