Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In and , clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include , expectation maximization (EM), spectral clustering, correlation clustering and .

Related topics: , , knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions
9
votes
2 answers

How to perform clustering without removing rows where NA is present in R

I have a data which contain some NA value in their elements. What I want to do is to perform clustering without removing rows where the NA is present. I understand that gower distance measure in daisy allow such situation. But why my code below…
neversaint
  • 60,904
  • 137
  • 310
  • 477
9
votes
3 answers

Kmeans matlab "Empty cluster created at iteration 1" error

I'm using this script to cluster a set of 3D points using the kmeans matlab function but I always get this error "Empty cluster created at iteration 1". The script I'm using: [G,C] = kmeans(XX, K, 'distance','sqEuclidean', 'start','sample'); XX…
Tak
  • 3,536
  • 11
  • 51
  • 93
9
votes
4 answers

Correlation clustering in R

I'd like to use correlation clustering and I figure R is a good place to start. I can present the data to R as a set of large, sparse vectors or as a table with a pre-computed dissimilarity matrix. My questions are: are there existing R functions…
daveb
  • 74,111
  • 6
  • 45
  • 51
9
votes
3 answers

Clustering words into groups

This is a Homework question. I have a huge document full of words. My challenge is to classify these words into different groups/clusters that adequately represent the words. My strategy to deal with it is using the K-Means algorithm, which as you…
Parijat Kalia
  • 4,929
  • 10
  • 50
  • 77
9
votes
1 answer

Clustering and Bayes classifiers Matlab

So I am at a cross roads on what to do next, I set out to learn and apply some machine learning algorithms on a complicated dataset and I have now done this. My plan from the very beginning was to combine two possible classifiers in an attempt to…
G Gr
  • 6,030
  • 20
  • 91
  • 184
8
votes
1 answer

Plotting output of kmeans(PyCluster impl)

How does on plot output of kmeans clustering in python? I am using PyCluster package. allUserVector is an n by m dimensonal vector , basically n users with m features. import Pycluster as pc import numpy as np clusterid,error,nfound =…
Maxwell
  • 409
  • 1
  • 6
  • 19
8
votes
2 answers

Markov Clustering Algorithm

I've been working through the following example of the details of the Markov Clustering algorithm: http://www.cs.ucsb.edu/~xyan/classes/CS595D-2009winter/MCL_Presentation2.pdf I feel like I have accurately represented the algorithm but I am not…
methodin
  • 6,717
  • 1
  • 25
  • 27
8
votes
2 answers

Combining different similarities to build one final similarity

Im pretty much new to data mining and recommendation systems, now trying to build some kind of rec system for users that have such parameters: city education interest To calculate similarity between them im gonna apply cosine similarity and…
Leg0
  • 510
  • 9
  • 21
8
votes
5 answers

Clustering 2d integer coordinates into sets of at most N points

I have a number of points on a relatively small 2-dimensional grid, which wraps around in both dimensions. The coordinates can only be integers. I need to divide them into sets of at most N points that are close together, where N will be quite a…
Ben
  • 68,572
  • 20
  • 126
  • 174
8
votes
2 answers

k-means: Same clusters for every execution

Is it possible to get same kmeans clusters for every execution for a particular data set. Just like for a random value we can use a fixed seed. Is it possible to stop randomness for clustering?
Robin
  • 81
  • 1
  • 1
  • 2
8
votes
3 answers

Looking for collective intelligence .Net / C# resources

Firstly, I realise that this is a very similar question to this one: Which are the good open source libraries for Collective Intelligence in .net/java? ... but all the answers to that one were Java centric so I am asking again, this time looking…
Steve
  • 8,469
  • 1
  • 26
  • 37
8
votes
5 answers

How to summarize a list of combination

I have a list of 2 elements' combination like below. cbnl <- list( c("A", "B"), c("B", "A"), c("C", "D"), c("E", "D"), c("F", "G"), c("H", "I"), c("J", "K"), c("I", "H"), c("K", "J"), c("G", "F"), c("D", "C"), c("E", "C"), c("D", "E"), c("C",…
kabocha
  • 133
  • 7
8
votes
5 answers

How to cluster objects (without coordinates)

I have a list of opaque objects. I am only able to calculate the distance between them (not true, just setting the conditions for the problem): class Thing { public double DistanceTo(Thing other); } I would like to cluster these objects. I…
Frank Krueger
  • 69,552
  • 46
  • 163
  • 208
8
votes
2 answers

Kubernetes increase resources for all deployments

I am new to Kubernetes. I have a K8 cluster with multiple deployments (more than 150), each having more than 4 pods scaled. I have a requirement to increase resource limits for all deployments in the cluster; and I'm aware I can increase this…
8
votes
1 answer

HDBSCAN difference between parameters

I'm confused about the difference between the following parameters in HDBSCAN min_cluster_size min_samples cluster_selection_epsilon Correct me if I'm wrong. For min_samples, if it is set to 7, then clusters formed need to have 7 or more…