Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In machine-learning and data-mining, clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include k-means, expectation maximization (EM), spectral clustering, correlation clustering and hierarchical-clustering.

Related topics: classification, pattern-recognition, knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions

votes

2 answers

How do i check if a cost function is Concave or Convex?

How do i check if this cost function is concave or convex? I also want to find if this has a single or multiple minimums. Effort Made; function [w,pi,costvalue] = main_cost(inputdata, tmax, alpha_ini,somrow,somcol) %main cost function; To…

matlab cluster-analysis som convex-optimization concave

asked Nov 19 '15 at 08:10

Young_DataAnalyst

votes

1 answer

How to check a new point is inside the exist clusters (Python)

I am a bit confused about Clustering e.g. K-means clustering. I have already created clusters for the training for and in the testing part I want to know if the new points are already in the clusters or if they can be in the cluster or not? My idea…

python testing cluster-analysis k-means training-data

asked Nov 17 '15 at 09:01

sws

votes

1 answer

Clustering for Categorical and Numerical data

I have a collection of alerts and I want to group it based on similarity/distance. As we have non-numeric data, How can i perform clustering for this kind of data. set.seed(42) data.frame(Host1 = rep("del",10), Host2 = c(rep("cpp",4),…

r cluster-analysis data-manipulation

asked Nov 13 '15 at 03:11

Navin Manaswi

votes

2 answers

Given two sets of vectors, how do I find the closest vector in the second set for each vector in the first set?

Given: Two sets {S1, S2} of vectors of dimension D. S1 is represented by a N*D matrix and accordingly is S2 represented by a M*D matrix. I am looking for an elegant way to get for every vector s1 in S1 the nearest neighbour s2 in S2 and the…

matlab vector cluster-analysis nearest-neighbor

asked Nov 11 '15 at 21:48

Simon

votes

2 answers

Clustering of Variables in python

I have hundreds of variables with binary values i.e., 1 & 0 and I want to see how these variables fall into different clusters? I don't see any python methods to apply. But I can see one in R: http://arxiv.org/pdf/1112.0295.pdf For example, I have…

python-3.x cluster-analysis

asked Nov 10 '15 at 17:48

Sanoj

1,347
3
15
21

votes

2 answers

Best way to validate DBSCAN Clusters

I have used the ELKI implementation of DBSCAN to identify fire hot spot clusters from a fire data set and the results look quite good. The data set is spatial and the clusters are based on latitude, longitude. Basically, the DBSCAN parameters…

cluster-analysis data-mining dbscan

asked Nov 03 '15 at 15:07

Stephen K. Karanja

votes

0 answers

Implementing a fast DBSCAN in C#

I tried to implement a DBSCAN in C# using kd-trees. I followed the implementation from: http://www.yzuzun.com/2015/07/dbscan-clustering-algorithm-and-c-implementation/ public class DBSCANAlgorithm { private readonly Func…

c# algorithm scikit-learn cluster-analysis dbscan

asked Oct 28 '15 at 03:37

John Tan

1,331
1
19
35

votes

1 answer

Using Hibernate between different threads,JVMs and servers

I'm working on a system which has a 4 modules, each working on its own server and each should be able to clustered. What I basically need is to have the ability to work on the same entities on the different modules and have them update appropriately…

java hibernate cluster-analysis multithreading terracotta

asked Jul 26 '10 at 16:10

Ittai

5,625
14
60
97

votes

1 answer

How to identify my objects in ELKI DBSCAN results?

I'm using ELKI GUI to run DBSCAN algorithm. My input is a CSV file. I create a projection as feature selection: -dbc.filter transform.ProjectionFilter -projection NumericalFeatureSelection -projectionfilter.selectedattributes 1,2 ELKI gives me…

cluster-analysis dbscan elki

asked Oct 14 '15 at 07:30

Omid Ebrahimi

1,150
2
20
38

votes

1 answer

How to define a custom similarity measure

I need some help defining a custom similarity measure. I have a dataset whose elements are defined by 4 attributes. As an example, consider the following two items: Element 1: A1: "R1", "R3", "R4", "R7" A2: "H1" A3 "F1", "F2" A4 "aaa"…

machine-learning cluster-analysis data-mining similarity

asked Sep 29 '15 at 15:37

betto86

votes

0 answers

Discriminant Analysis of Principal components and how to graphically show the distances of data points to its multivariate centroid

I have been attempting to graphically produce a scatterplot (similar to figure 1) showing the distance of data points to its multivariate centroid. The data contains two categorical grouping factors (V4 or G8) under the column family(response…

r graphics cluster-analysis pca lda

asked Sep 21 '15 at 21:34

Alice Hobbs

1,021
1
15
31

votes

1 answer

Multichannel sequence analysis through WeightedCluster package

I would like to apply the functions available in the WeightedCluster package to analyze multichannel sequences I obtained through TraMineR. I am trying so, but due to the fact that multichannel sequences are lists composed by each channel…

r cluster-analysis sequences traminer

asked Sep 16 '15 at 08:47

Gina Zetkin

votes

0 answers

Clustering multivalue nominal attributes with different measures

I have to apply a clustering algorithm to my dataset which is composed by elements composed by attributes of different nature: A1 -> multivalued, nominal values A2 -> multivalued, nominal values A3 -> multivalued, nominal values A4 -> single nominal…

r cluster-analysis data-mining

asked Sep 10 '15 at 14:52

betto86

votes

1 answer

In R, Train/update model with multiple datasets

In R, I'm trying to train a neural network on multiple files. I have preformed the multinom function on a single dataset but I cannot find how to train my model with another dataset. So I want to apply a model from a previous call to new data…

regex r neural-network cluster-analysis pattern-recognition

asked Sep 05 '15 at 12:46

AdamA3

votes

5 answers

Visualize data and clustering

i am currently writing a python script to find the similarity between documents.I have already calculated the similarities score for each document pairs and store them in dictionaries. It looks something like this: {(8328, 8327): 1.0, (8313, 8306):…

python cluster-analysis visualization

asked Jul 13 '10 at 19:22

Jacky

Prev 1 2 3

…

99 100 Next