Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In and , clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include , expectation maximization (EM), spectral clustering, correlation clustering and .

Related topics: , , knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions
2
votes
0 answers

Latent class clustering

I have data that contains continuous and categorical variables and I have to cluster that data using latent class analaysis - LCA. I know that LCA sometimes mean that manifest variables are categorical but I read that programs like LatentGold know…
2
votes
1 answer

How do I identify the correct clustering algorithm for the available data?

I have the sample data of flight routes, number of searches for that route, gross profit for the route, number of transactions for the route. I want to bucket flight routes which shows similar characteristics based on above mentioned variables. What…
Pratik409
  • 301
  • 2
  • 10
2
votes
0 answers

How to evaluate the best K for LDA using Mallet?

I am using Mallet api to extract topic from twitter data and I have already extracted topics which are seems good topic. But I am facing problem to estimating K. For example I fixed K value from 10 to 100. So, I have taken different number of topics…
Khaled
  • 255
  • 4
  • 16
2
votes
1 answer

Clustering algorithm for unweighted graphs

I am having unweighted and undirected graph as my network which is basically the network of proteins.I want to cluster this graph and divide this graph in to disjoint clusters. Can any 1 suggest clustering algorithms which i can apply on the…
seema aswani
  • 177
  • 1
  • 14
2
votes
0 answers

How can I use Prediction over clusters [R]

I have a telemetry data which consist of its position and the activity of a bird. The dataset is in the csv format which I am uploading: Lat. Long. Act Date Time 12 17 Eat 5-1-08 13:10 14 18 Rest 5-1-08 13:30 19 14 Walk …
user4993868
2
votes
1 answer

How to interpret k Medoids output

I have found this implementation of K-Medoids and I decided to try it in my code. My original dataset is a 21x6 matrix. To generate the distance matrix I'm using: import scipy.spatial.distance as ssd distanceMatrix = ssd.squareform(ssd.pdist(matr,…
Vektor88
  • 4,841
  • 11
  • 59
  • 111
2
votes
2 answers

Clustering points based on their linear proximity

I have data that I want to cluster into two groups based on their linear proximity (i.e., points that are almost collinear gets to be grouped together). Here is a sample of my data: data <- data.frame(Y=c(seq(0,10,1), seq(0,4,0.5)), X=…
Filly
  • 713
  • 12
  • 23
2
votes
1 answer

Matching trajectories of whiskers

I am performing a whisker-tracking experiments. I have high-speed videos (500fps) of rats whisking against objects. In each such video I tracked the shape of the rat's snout and whiskers. Since tracking is noisy, the number of whiskers in each frame…
2
votes
1 answer

Openrefine: cross cluster two dataset

I've got two datasets with titles and other informations, but in dataset A I have titles, in dataset B I have titles and URL. I have to put the URL in dataset A from dataset B. Some titles are the same in A and B, some others are not, some others…
Lara M.
  • 855
  • 2
  • 10
  • 23
2
votes
1 answer

Python: computing pariwise distances causes memory error

I want to compute the pairwise distances of 57832 vectors. Each vector has 200 dimensions. I am using pdist to compute the distances. from scipy.spatial.distance import pdist pairwise_distances = pdist(X, 'cosine') # pdist is supposed to return a…
Munichong
  • 3,861
  • 14
  • 48
  • 69
2
votes
2 answers

OpenCV 1.1 K-Means Clustering in High Dimensional Spaces

I am trying to write a bag of features system image recognition system. One step in the algorithm is to take a larger number of small image patches (say 7x7 or 11x11 pixels) and try to cluster them into groups that look similar. I get my patches…
kscottz
  • 21
  • 1
  • 2
2
votes
1 answer

Hierarchical clustering a pairwise distance matrix of precomputed distances

I have a pairwise distance dataframe that I've made with pandas: #Get files import glob import itertools one_dimension = glob.glob('*.pdb') dataframe = [] for combo in itertools.combinations(one_dimension,2): pdb_1 = combo[0] pdb_2 =…
jwillis0720
  • 4,329
  • 8
  • 41
  • 74
2
votes
2 answers

Clustering algorithm with different epsilons on different axes

I am looking for a clustering algorithm such a s DBSCAN do deal with 3d data, in which is possible to set different epsilons depending on the axis. So for instance an epsilon of 10m on the x-y plan, and an epsilon 0.2m on the z axis. Essentially, I…
yamayama
  • 49
  • 9
2
votes
1 answer

What is the Haversine equation measured in for DBSCAN analysis in RapidMiner?

When I am using the DBSCAN clustering algorithm in RapidMiner, I am not sure of what value the Haversine equation uses as an epsilon. The dataset I am currently working with is coded in latitude and longitude degrees. I want the measurement to…
2
votes
1 answer

Data Clustering approach

I am writing a program in C# in which I have a set of 200 points displayed on an image. However, the points tend to cluster in various regions, and I am looking to find a way to "cluster." In other words, maybe draw a circle/ellipse around the…
Brett
  • 11,637
  • 34
  • 127
  • 213