Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In machine-learning and data-mining, clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include k-means, expectation maximization (EM), spectral clustering, correlation clustering and hierarchical-clustering.

Related topics: classification, pattern-recognition, knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions

votes

1 answer

The difference between dist functions in r

I want to calculate the dissimilarity indices on a binary matrix and have found several functions in R, but I can't get them to agree. I use the jaccard coefficient as an example in the four functions: vegdist(), sim(), designdist(), and dist(). I'm…

r cluster-analysis vegan

asked Mar 08 '16 at 15:16

Magnus Hallas

votes

0 answers

Community detection of large graph in Java

I'm currently using the GraphStream library to represent a very large directed weighted graph (35000 nodes with about 200000 edges) in Java. My goal is to detect communities of nodes within the graph, and the library has some community detection…

algorithm graph cluster-analysis graph-algorithm

asked Mar 08 '16 at 02:09

Jay

votes

2 answers

clustering on geo points using R

I have a set of Lat, long points for a city. Now I want to cluster these points based on 500m radius or 1km radius using R. Precisely, I want to find to find out centroids as well as all those points within 500m radius for that particular…

r cluster-analysis latitude-longitude k-means geo

asked Feb 22 '16 at 12:34

Swetha K V

votes

1 answer

Dimensionality reduction for high dimensional sparse data before clustering or spherical k-means?

I am trying to build my first recommender system where i create a user feature space and then cluster them into different groups. Then for the recommendation to work for a particular user , first i find out the cluster to which the user belongs and…

cluster-analysis sparse-matrix recommendation-engine euclidean-distance dimensionality-reduction

asked Feb 18 '16 at 12:37

rehan ali

votes

1 answer

How to display the row name in K means cluster plot in R?

I am trying to plot the K-means cluster. The below is the code i use. library(cluster) library(fpc) data(iris) dat <- iris[, -5] # without known classification # Kmeans clustre analysis clus <- kmeans(dat, centers=3) clusplot(dat, clus$cluster,…

r cluster-analysis k-means

asked Jan 25 '16 at 06:25

Arun

votes

1 answer

Understanding the Biclust class in R

I'm new in R Language, but I'm using the biclust package for Bicluster Analysis. After to search information in web, I could run some biclustering algorithms but I could not access to the resulting information. For Example, after run >…

r cluster-analysis

asked Jan 23 '16 at 18:08

henryr

votes

1 answer

Spectral clustering on sparse dataset

I am applying spectral clustering (sklearn.cluster.SpectralClustering) on a dataset with quite some features that are relatively sparse. When doing spectral clustering in Python, I get the following warning: UserWarning: Graph is not fully…

python scipy scikit-learn cluster-analysis spectral

asked Jan 19 '16 at 09:31

Guido

6,182
1
29
50

votes

2 answers

Density Based Clustering with Representatives

I'm looking for a method to perform density based clustering. The resulting clusters should have a representative unlike DBSCAN. Mean-Shift seems to fit those needs but doesn't scale enough for my needs. I have looked into some subspace clustering…

cluster-analysis dbscan elki mean-shift

asked Jan 12 '16 at 20:35

Milan

votes

1 answer

Empty clusters in K-means clustering

When applying K-means clustering we are picking k initial clusters and then iterating through all the points and assigning them to some cluster and also updating the centers of the clusters. Eventually we do not do any other update. Yet I noticed…

cluster-analysis bioinformatics k-means

asked Jan 09 '16 at 13:32

stryker

votes

1 answer

Find the most similar set of samples – A function that finds a cluster of a given size

I need to find a cluster with a specific number of members. If I had distance data for any number of samples I want to find the first incidence in which three locations become clustered during agglomerative clustering. In otherwards, I want to find…

r cluster-analysis hierarchical-clustering

asked Jan 09 '16 at 01:25

Dylan S.

votes

2 answers

ELKI OPTICS pre-computed distance matrix

I can't seem to get this algorithm to work on my dataset, so I took a very small subset of my data and tried to get it to work, but that didn't work either. I want to input a precomputed distance matrix into ELKI, and then have it find the…

machine-learning cluster-analysis data-mining elki optics-algorithm

asked Jan 05 '16 at 15:43

Froblinkin

votes

1 answer

latitude and longitude clustering in python

I am working with a dataframe which has lat and long data, I need to cluster points which are nearest to each other lets say(200 meters). This is what I am doing in Python. order_lat order_long 0 19.111841 72.910729 1 19.111342 …

python cluster-analysis geospatial

asked Jan 02 '16 at 08:07

Neil

7,937
22
87
145

votes

3 answers

Calculating similarity between and centroid of Lucene documents

In order to perform a simple clustering algorithm on results that I get from Lucene, I have to calculate Cosine similarity between 2 documents in Lucene, I also need to be able to make a centroid document to represent the centroid of each cluster.…

java lucene cluster-analysis similarity tf-idf

asked Aug 10 '10 at 08:24

Mark

votes

1 answer

Include the spatial context of pixels during image clustering

How can the spatial context (or neighbourhood) of a pixel be taken into account (besides the pixel intensity) when clustering an image? For the time being, I'm using K-means, GMM and Fuzzy C-means which cluster the image based only on the…

image-processing cluster-analysis k-means gaussian noise

asked Nov 20 '15 at 19:45

Hakim

3,225
5
37
75

votes

0 answers

Finding k for kmeans in python

So I have a dataset consisting 130000 points, in the format (x,y). My final goal is to cluster this data using kmeans. But for applying that, I need to find the optimum number of clusters to pass to the kmeans algorithm. How should I apply something…

python machine-learning cluster-analysis k-means data-science

asked Nov 19 '15 at 19:50

Siddharth Shah

Prev 1 2 3

…

99 100 Next