Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In machine-learning and data-mining, clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include k-means, expectation maximization (EM), spectral clustering, correlation clustering and hierarchical-clustering.

Related topics: classification, pattern-recognition, knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions

votes

0 answers

Latent class clustering

I have data that contains continuous and categorical variables and I have to cluster that data using latent class analaysis - LCA. I know that LCA sometimes mean that manifest variables are categorical but I read that programs like LatentGold know…

cluster-analysis

asked Aug 19 '15 at 12:26

ArtemisEntr3ri

votes

1 answer

How do I identify the correct clustering algorithm for the available data?

I have the sample data of flight routes, number of searches for that route, gross profit for the route, number of transactions for the route. I want to bucket flight routes which shows similar characteristics based on above mentioned variables. What…

cluster-analysis data-mining k-means

asked Aug 12 '15 at 13:11

Pratik409

votes

0 answers

How to evaluate the best K for LDA using Mallet?

I am using Mallet api to extract topic from twitter data and I have already extracted topics which are seems good topic. But I am facing problem to estimating K. For example I fixed K value from 10 to 100. So, I have taken different number of topics…

cluster-analysis lda topic-modeling mallet

asked Jul 30 '15 at 16:26

Khaled

votes

1 answer

Clustering algorithm for unweighted graphs

I am having unweighted and undirected graph as my network which is basically the network of proteins.I want to cluster this graph and divide this graph in to disjoint clusters. Can any 1 suggest clustering algorithms which i can apply on the…

graph cluster-analysis

asked Jul 29 '15 at 07:39

seema aswani

votes

0 answers

How can I use Prediction over clusters [R]

I have a telemetry data which consist of its position and the activity of a bird. The dataset is in the csv format which I am uploading: Lat. Long. Act Date Time 12 17 Eat 5-1-08 13:10 14 18 Rest 5-1-08 13:30 19 14 Walk …

r ggplot2 pattern-matching cluster-analysis prediction

asked Jul 26 '15 at 11:00

user4993868

votes

1 answer

How to interpret k Medoids output

I have found this implementation of K-Medoids and I decided to try it in my code. My original dataset is a 21x6 matrix. To generate the distance matrix I'm using: import scipy.spatial.distance as ssd distanceMatrix = ssd.squareform(ssd.pdist(matr,…

python cluster-analysis

asked Jul 24 '15 at 12:40

Vektor88

4,841
11
59
111

votes

2 answers

Clustering points based on their linear proximity

I have data that I want to cluster into two groups based on their linear proximity (i.e., points that are almost collinear gets to be grouped together). Here is a sample of my data: data <- data.frame(Y=c(seq(0,10,1), seq(0,4,0.5)), X=…

r filtering classification cluster-analysis linear-regression

asked Jul 16 '15 at 17:34

Filly

votes

1 answer

Matching trajectories of whiskers

I am performing a whisker-tracking experiments. I have high-speed videos (500fps) of rats whisking against objects. In each such video I tracked the shape of the rat's snout and whiskers. Since tracking is noisy, the number of whiskers in each frame…

image matlab cluster-analysis

asked Jul 16 '15 at 12:02

BestBoyCoop

votes

1 answer

Openrefine: cross cluster two dataset

I've got two datasets with titles and other informations, but in dataset A I have titles, in dataset B I have titles and URL. I have to put the URL in dataset A from dataset B. Some titles are the same in A and B, some others are not, some others…

cluster-analysis openrefine

asked Jul 10 '15 at 13:08

Lara M.

votes

1 answer

Python: computing pariwise distances causes memory error

I want to compute the pairwise distances of 57832 vectors. Each vector has 200 dimensions. I am using pdist to compute the distances. from scipy.spatial.distance import pdist pairwise_distances = pdist(X, 'cosine') # pdist is supposed to return a…

python memory numpy scipy cluster-analysis

asked Jul 09 '15 at 14:25

Munichong

3,861
14
48
69

votes

2 answers

OpenCV 1.1 K-Means Clustering in High Dimensional Spaces

I am trying to write a bag of features system image recognition system. One step in the algorithm is to take a larger number of small image patches (say 7x7 or 11x11 pixels) and try to cluster them into groups that look similar. I get my patches…

c++ opencv cluster-analysis vision

asked Jun 27 '10 at 17:07

kscottz

votes

1 answer

Hierarchical clustering a pairwise distance matrix of precomputed distances

I have a pairwise distance dataframe that I've made with pandas: #Get files import glob import itertools one_dimension = glob.glob('*.pdb') dataframe = [] for combo in itertools.combinations(one_dimension,2): pdb_1 = combo[0] pdb_2 =…

python pandas scipy cluster-analysis hierarchical-clustering

asked Jun 27 '15 at 05:12

jwillis0720

4,329
8
41
74

votes

2 answers

Clustering algorithm with different epsilons on different axes

I am looking for a clustering algorithm such a s DBSCAN do deal with 3d data, in which is possible to set different epsilons depending on the axis. So for instance an epsilon of 10m on the x-y plan, and an epsilon 0.2m on the z axis. Essentially, I…

cluster-analysis data-mining dbscan elki

asked Jun 26 '15 at 12:49

yamayama

votes

1 answer

What is the Haversine equation measured in for DBSCAN analysis in RapidMiner?

When I am using the DBSCAN clustering algorithm in RapidMiner, I am not sure of what value the Haversine equation uses as an epsilon. The dataset I am currently working with is coded in latitude and longitude degrees. I want the measurement to…

parameters cluster-analysis distance rapidminer dbscan

asked Jun 02 '15 at 17:57

Lou Klein

votes

1 answer

Data Clustering approach

I am writing a program in C# in which I have a set of 200 points displayed on an image. However, the points tend to cluster in various regions, and I am looking to find a way to "cluster." In other words, maybe draw a circle/ellipse around the…

c# cluster-analysis k-means data-processing

asked Jun 15 '10 at 15:29

Brett

11,637
34
127
213

Prev 1 2 3

…

99 100 Next