Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In and , clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include , expectation maximization (EM), spectral clustering, correlation clustering and .

Related topics: , , knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions
2
votes
2 answers

How do i check if a cost function is Concave or Convex?

How do i check if this cost function is concave or convex? I also want to find if this has a single or multiple minimums. Effort Made; function [w,pi,costvalue] = main_cost(inputdata, tmax, alpha_ini,somrow,somcol) %main cost function; To…
2
votes
1 answer

How to check a new point is inside the exist clusters (Python)

I am a bit confused about Clustering e.g. K-means clustering. I have already created clusters for the training for and in the testing part I want to know if the new points are already in the clusters or if they can be in the cluster or not? My idea…
sws
  • 39
  • 1
  • 6
2
votes
1 answer

Clustering for Categorical and Numerical data

I have a collection of alerts and I want to group it based on similarity/distance. As we have non-numeric data, How can i perform clustering for this kind of data. set.seed(42) data.frame(Host1 = rep("del",10), Host2 = c(rep("cpp",4),…
Navin Manaswi
  • 964
  • 7
  • 19
2
votes
2 answers

Given two sets of vectors, how do I find the closest vector in the second set for each vector in the first set?

Given: Two sets {S1, S2} of vectors of dimension D. S1 is represented by a N*D matrix and accordingly is S2 represented by a M*D matrix. I am looking for an elegant way to get for every vector s1 in S1 the nearest neighbour s2 in S2 and the…
Simon
  • 706
  • 6
  • 23
2
votes
2 answers

Clustering of Variables in python

I have hundreds of variables with binary values i.e., 1 & 0 and I want to see how these variables fall into different clusters? I don't see any python methods to apply. But I can see one in R: http://arxiv.org/pdf/1112.0295.pdf For example, I have…
Sanoj
  • 1,347
  • 3
  • 15
  • 21
2
votes
2 answers

Best way to validate DBSCAN Clusters

I have used the ELKI implementation of DBSCAN to identify fire hot spot clusters from a fire data set and the results look quite good. The data set is spatial and the clusters are based on latitude, longitude. Basically, the DBSCAN parameters…
2
votes
0 answers

Implementing a fast DBSCAN in C#

I tried to implement a DBSCAN in C# using kd-trees. I followed the implementation from: http://www.yzuzun.com/2015/07/dbscan-clustering-algorithm-and-c-implementation/ public class DBSCANAlgorithm { private readonly Func
John Tan
  • 1,331
  • 1
  • 19
  • 35
2
votes
1 answer

Using Hibernate between different threads,JVMs and servers

I'm working on a system which has a 4 modules, each working on its own server and each should be able to clustered. What I basically need is to have the ability to work on the same entities on the different modules and have them update appropriately…
Ittai
  • 5,625
  • 14
  • 60
  • 97
2
votes
1 answer

How to identify my objects in ELKI DBSCAN results?

I'm using ELKI GUI to run DBSCAN algorithm. My input is a CSV file. I create a projection as feature selection: -dbc.filter transform.ProjectionFilter -projection NumericalFeatureSelection -projectionfilter.selectedattributes 1,2 ELKI gives me…
Omid Ebrahimi
  • 1,150
  • 2
  • 20
  • 38
2
votes
1 answer

How to define a custom similarity measure

I need some help defining a custom similarity measure. I have a dataset whose elements are defined by 4 attributes. As an example, consider the following two items: Element 1: A1: "R1", "R3", "R4", "R7" A2: "H1" A3 "F1", "F2" A4 "aaa"…
betto86
  • 694
  • 1
  • 8
  • 23
2
votes
0 answers

Discriminant Analysis of Principal components and how to graphically show the distances of data points to its multivariate centroid

I have been attempting to graphically produce a scatterplot (similar to figure 1) showing the distance of data points to its multivariate centroid. The data contains two categorical grouping factors (V4 or G8) under the column family(response…
Alice Hobbs
  • 1,021
  • 1
  • 15
  • 31
2
votes
1 answer

Multichannel sequence analysis through WeightedCluster package

I would like to apply the functions available in the WeightedCluster package to analyze multichannel sequences I obtained through TraMineR. I am trying so, but due to the fact that multichannel sequences are lists composed by each channel…
Gina Zetkin
  • 333
  • 1
  • 5
  • 12
2
votes
0 answers

Clustering multivalue nominal attributes with different measures

I have to apply a clustering algorithm to my dataset which is composed by elements composed by attributes of different nature: A1 -> multivalued, nominal values A2 -> multivalued, nominal values A3 -> multivalued, nominal values A4 -> single nominal…
betto86
  • 694
  • 1
  • 8
  • 23
2
votes
1 answer

In R, Train/update model with multiple datasets

In R, I'm trying to train a neural network on multiple files. I have preformed the multinom function on a single dataset but I cannot find how to train my model with another dataset. So I want to apply a model from a previous call to new data…
2
votes
5 answers

Visualize data and clustering

i am currently writing a python script to find the similarity between documents.I have already calculated the similarities score for each document pairs and store them in dictionaries. It looks something like this: {(8328, 8327): 1.0, (8313, 8306):…
Jacky
  • 275
  • 1
  • 2
  • 6