Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In machine-learning and data-mining, clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include k-means, expectation maximization (EM), spectral clustering, correlation clustering and hierarchical-clustering.

Related topics: classification, pattern-recognition, knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions

votes

4 answers

Check if one regex covers another regex

I'm attempting to implement a text clustering algorithm. The algorithm clusters similar lines of raw text by replacing them with regexes, and aggregates the number of patterns matching each regex so as to provide a neat summary of the input text…

c++ regex data-mining cluster-analysis

asked Mar 27 '12 at 10:42

Kowshik

1,541
3
17
25

votes

4 answers

given 10 functions y=a+bx and 1000's of (x,y) data points rounded to ints, how to derive 10 best (a,b) tuples?

We build software that audits fees charged by banks to merchants that accept credit and debit cards. Our customers want us to tell them if the card processor is overcharging them. Per-transaction credit card fees are calculated like this: fee =…

c# sql algorithm statistics cluster-analysis

asked Dec 22 '11 at 19:28

Justin Grant

44,807
15
124
208

votes

1 answer

clustering and matlab

I'm trying to cluster some data I have from the KDD 1999 cup dataset the output from the file looks like…

matlab machine-learning cluster-analysis data-mining fuzzy

asked Oct 10 '11 at 16:34

G Gr

6,030
20
91
184

votes

3 answers

How to cluster an instance with Weka's DBSCAN?

I've been trying to use the DBSCAN clusterer from Weka to cluster instances. From what I understand I should be using the clusterInstance() method for this, but to my surprise, when taking a look at the code of that method, it looks like the…

java cluster-analysis weka dbscan

asked Sep 17 '11 at 07:10

Oak

26,231
8
93
152

votes

2 answers

How to pick the T1 and T2 threshold values for Canopy Clustering?

I am trying to implement the Canopy clustering algorithm along with K-Means. I've done some searching online that says to use Canopy clustering to get your initial starting points to feed into K-means, the problem is, in Canopy clustering, you need…

cluster-analysis subset k-means

asked Aug 28 '11 at 22:17

Jonathan

votes

2 answers

Clustering geospatial data on coordinates AND non spatial feature

Say i have the following dataframe stored as a variable called coordinates, where the first few rows look like: business_lat business_lng business_rating 0 19.111841 72.910729 5. 1 19.111342 72.908387 5. 2 …

python scikit-learn cluster-analysis geospatial dbscan

asked Feb 28 '21 at 05:07

sometimesiwritecode

2,993
7
31
69

votes

7 answers

How to compute precision and recall in clustering?

I am really confused how to compute precision and recall in clustering applications. I have the following situation: Given two sets A and B. By using a unique key for each element I can determine which of the elements of A and B match. I want to…

cluster-analysis precision-recall

asked Mar 18 '09 at 11:40

Christian Stade-Schuldt

4,671
7
35
30

votes

2 answers

Java text clustering library

Which of the data mining java libraries can do text clusterization?

java cluster-analysis data-mining

asked May 02 '11 at 11:12

bme

votes

5 answers

Order of rows in heatmap?

Take the following code: heatmap(data.matrix(signals),col=colors,breaks=breaks,scale="none",Colv=NA,labRow=NA) How can I extract, pre-calculate or re-calculate the order of the rows in the heatmap produced? Is there a way to inject the output of…

r cluster-analysis heatmap

asked Mar 16 '11 at 03:42

Ron Gejman

6,135
3
25
34

votes

3 answers

how do I cluster a list of geographic points by distance?

I have a list of points P=[p1,...pN] where pi=(latitudeI,longitudeI). Using Python 3, I would like to find a smallest set of clusters (disjoint subsets of P) such that every member of a cluster is within 20km of every other member in the…

python cluster-analysis latitude-longitude spatial-query

asked Oct 31 '18 at 02:34

Lars Ericson

1,952
4
32
45

votes

1 answer

How to perform clustering on Word2Vec

I have a semi-structured dataset, each row pertains to a single user: id, skills 0,"java, python, sql" 1,"java, python, spark, html" 2, "business management, communication" Why semi-structured is because the followings skills can only be selected…

python nlp cluster-analysis data-mining word2vec

asked Aug 28 '18 at 03:07

Ivan

votes

1 answer

Why is Adjusted rand index(ARI) better than rand index(RI) and how to understand ARI intuitively from the formula

I read the wikipedia article about Rand Index and Adjusted Rand Index. I can understand how they are calculated mathematically and can interpret Rand index as the ration of agreements over disagreements. But I am failing to have same intuition about…

machine-learning statistics cluster-analysis

asked May 08 '18 at 15:45

RTM

votes

3 answers

Rand Index function (clustering performance evaluation)

As far as I know, there is no package available for Rand Index in python while for Adjusted Rand Index you have the option of using sklearn.metrics.adjusted_rand_score(labels_true, labels_pred). I wrote the code for Rand Score and I am going to…

python cluster-analysis precision unsupervised-learning

asked Mar 31 '18 at 10:28

Hadij

3,661
5
26
48

votes

1 answer

Infomap community detection understanding

i need a understandable description of the Infomap Community Detection Algorithm. I read the papers, but it was not clear for me. My questions: How does the algorithm basically work? What has random walks to do with it? What is the map equation and…

algorithm cluster-analysis graph-theory

asked Jan 30 '18 at 18:57

Sully

votes

3 answers

How to perform cluster with weights/density in python? Something like kmeans with weights?

My data is like this: powerplantname, latitude, longitude, powergenerated A, -92.3232, 100.99, 50 B, , , 10 C, , , 20 D, , , 40 E, , , 5 I want to be able to cluster the data into N number of clusters…

python algorithm scipy scikit-learn cluster-analysis

asked Jul 11 '17 at 03:51

Rolando

58,640
98
266
407

Prev 1 2 3

…

99 100 Next