Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In and , clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include , expectation maximization (EM), spectral clustering, correlation clustering and .

Related topics: , , knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions
2
votes
1 answer

The difference between dist functions in r

I want to calculate the dissimilarity indices on a binary matrix and have found several functions in R, but I can't get them to agree. I use the jaccard coefficient as an example in the four functions: vegdist(), sim(), designdist(), and dist(). I'm…
2
votes
0 answers

Community detection of large graph in Java

I'm currently using the GraphStream library to represent a very large directed weighted graph (35000 nodes with about 200000 edges) in Java. My goal is to detect communities of nodes within the graph, and the library has some community detection…
Jay
  • 121
  • 1
  • 4
2
votes
2 answers

clustering on geo points using R

I have a set of Lat, long points for a city. Now I want to cluster these points based on 500m radius or 1km radius using R. Precisely, I want to find to find out centroids as well as all those points within 500m radius for that particular…
Swetha K V
  • 43
  • 1
  • 6
2
votes
1 answer

Dimensionality reduction for high dimensional sparse data before clustering or spherical k-means?

I am trying to build my first recommender system where i create a user feature space and then cluster them into different groups. Then for the recommendation to work for a particular user , first i find out the cluster to which the user belongs and…
2
votes
1 answer

How to display the row name in K means cluster plot in R?

I am trying to plot the K-means cluster. The below is the code i use. library(cluster) library(fpc) data(iris) dat <- iris[, -5] # without known classification # Kmeans clustre analysis clus <- kmeans(dat, centers=3) clusplot(dat, clus$cluster,…
Arun
  • 625
  • 3
  • 10
  • 20
2
votes
1 answer

Understanding the Biclust class in R

I'm new in R Language, but I'm using the biclust package for Bicluster Analysis. After to search information in web, I could run some biclustering algorithms but I could not access to the resulting information. For Example, after run >…
henryr
  • 169
  • 1
  • 15
2
votes
1 answer

Spectral clustering on sparse dataset

I am applying spectral clustering (sklearn.cluster.SpectralClustering) on a dataset with quite some features that are relatively sparse. When doing spectral clustering in Python, I get the following warning: UserWarning: Graph is not fully…
Guido
  • 6,182
  • 1
  • 29
  • 50
2
votes
2 answers

Density Based Clustering with Representatives

I'm looking for a method to perform density based clustering. The resulting clusters should have a representative unlike DBSCAN. Mean-Shift seems to fit those needs but doesn't scale enough for my needs. I have looked into some subspace clustering…
Milan
  • 929
  • 2
  • 13
  • 25
2
votes
1 answer

Empty clusters in K-means clustering

When applying K-means clustering we are picking k initial clusters and then iterating through all the points and assigning them to some cluster and also updating the centers of the clusters. Eventually we do not do any other update. Yet I noticed…
stryker
  • 41
  • 5
2
votes
1 answer

Find the most similar set of samples – A function that finds a cluster of a given size

I need to find a cluster with a specific number of members. If I had distance data for any number of samples I want to find the first incidence in which three locations become clustered during agglomerative clustering. In otherwards, I want to find…
Dylan S.
  • 359
  • 4
  • 15
2
votes
2 answers

ELKI OPTICS pre-computed distance matrix

I can't seem to get this algorithm to work on my dataset, so I took a very small subset of my data and tried to get it to work, but that didn't work either. I want to input a precomputed distance matrix into ELKI, and then have it find the…
2
votes
1 answer

latitude and longitude clustering in python

I am working with a dataframe which has lat and long data, I need to cluster points which are nearest to each other lets say(200 meters). This is what I am doing in Python. order_lat order_long 0 19.111841 72.910729 1 19.111342 …
Neil
  • 7,937
  • 22
  • 87
  • 145
2
votes
3 answers

Calculating similarity between and centroid of Lucene documents

In order to perform a simple clustering algorithm on results that I get from Lucene, I have to calculate Cosine similarity between 2 documents in Lucene, I also need to be able to make a centroid document to represent the centroid of each cluster.…
Mark
  • 312
  • 4
  • 17
2
votes
1 answer

Include the spatial context of pixels during image clustering

How can the spatial context (or neighbourhood) of a pixel be taken into account (besides the pixel intensity) when clustering an image? For the time being, I'm using K-means, GMM and Fuzzy C-means which cluster the image based only on the…
Hakim
  • 3,225
  • 5
  • 37
  • 75
2
votes
0 answers

Finding k for kmeans in python

So I have a dataset consisting 130000 points, in the format (x,y). My final goal is to cluster this data using kmeans. But for applying that, I need to find the optimum number of clusters to pass to the kmeans algorithm. How should I apply something…