Questions tagged [cluster-analysis]

Cluster analysis is the process of grouping "similar" objects into groups known as "clusters", along with the analysis of these results.

Cluster analysis is the task of grouping objects into subsets (called clusters) so that observations in the same cluster are similar in some sense, while observations in different clusters are dissimilar.

In machine-learning and data-mining, clustering is a method of unsupervised learning used to discover hidden structure in unlabeled data, and is commonly used in exploratory data analysis. Popular algorithms include k-means, expectation maximization (EM), spectral clustering, correlation clustering and hierarchical-clustering.

Related topics: classification, pattern-recognition, knowledge discovery, taxonomy. Not to be confused with cluster computing.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

6244 questions

votes

2 answers

How do I view the datapoints that are added to a cluster after applying K-Means algorithm?

I have implemented k-means algorithm in scala as follows. def clustering(clustnum:Int,iternum:Int,parsedData: RDD[org.apache.spark.mllib.linalg.Vector]): Unit= { val clusters = KMeans.train(parsedData, clustnum, iternum) println("The Cluster…

scala apache-spark cluster-analysis ibm-cloud k-means

asked Apr 27 '16 at 11:00

cumberdame

votes

1 answer

Adjusted Mutual Information (scikit-learn)

I have implemented a clustering algorithm for summarizing log files, and am currently testing it against ground-truth data with the Adjusted Rand index and the Adjusted Mutual Information index. Input to my algorithm is a list of log entries, and…

python-2.7 machine-learning scikit-learn cluster-analysis

asked Apr 26 '16 at 12:52

logfiler

votes

2 answers

Clustering longitude and latitude gps data

I have more than 400 thousand cars GPS locations, like: [ 25.41452217, 37.94879532], [ 25.33231735, 37.93455887], [ 25.44327736, 37.96868896], ... I need to make spatial clustering with the distance between points <= 3 meters. I tried to use…

python scikit-learn cluster-analysis

asked Apr 23 '16 at 20:33

M. Smith

votes

1 answer

Python K means clustering

I am trying to implement the code on this website to estimate what value of K I should use for my K means clustering. https://datasciencelab.wordpress.com/2014/01/21/selection-of-k-in-k-means-clustering-reloaded/ However I am not getting any success…

python machine-learning cluster-analysis k-means

asked Apr 19 '16 at 21:29

piccolo

2,093
3
24
56

votes

1 answer

Finding minimum number of required 'central points'

I have a set of 'n' nodes. A function returns a kind of distance between two nodes such that dist(a,c) may not be dist(a,b)+dist(b,c). Based on a threshold I connect certain nodes via edges. I wish to select the minimum number of nodes such that the…

algorithm graph cluster-analysis

asked Apr 18 '16 at 17:59

BlissfulSavant

votes

1 answer

Printing principal features in clusters (python)

I have a mxn matrix, with m features and n samples. The matrix is called term_individual. The clustering is done using scikitlearn: from sklearn.cluster import KMeans kmeans = KMeans(n_clusters=n_clusters) kmeans.fit(term_individual.T) centroids =…

python scikit-learn cluster-analysis

asked Apr 07 '16 at 18:18

Vladimir Vargas

1,744
4
24
48

votes

1 answer

Cluster groups of face images

I have extracted faces from a video and I clustered them in big groups (each group contains faces from the same person, I did this using change of background detection). Now I want to cluster those groups into a smaller number of groups and to have,…

python opencv cluster-analysis

asked Apr 06 '16 at 19:28

N. Ruchers

votes

1 answer

How to calculate BCubed precision and recall

According to the this published page BCubed precision and recall, thus F1-Measure calculation is the best technique for evaluating clustering performance. See Amigó, Enrique, et al. "A comparison of extrinsic clustering evaluation metrics based on…

machine-learning cluster-analysis data-mining precision-recall

asked Apr 06 '16 at 10:29

Furkan Gözükara

22,964
77
205
342

votes

2 answers

Clusterint 2D points using sklearn KDTree

I have an array of (n_sample x 2) and I want to cluster them using KDTree in sklearn.neighbors.KDTree. I have this sample piece of code: from sklearn.neighbors import KDTree import numpy as np np.random.seed(0) X = np.random.random((10, 2)) tree =…

python-2.7 scikit-learn cluster-analysis kdtree

asked Apr 01 '16 at 02:47

Ash

3,428
1
34
44

votes

2 answers

How may I calculate Accuracy in NLTK KMeans Clustering

I am trying to use NLTK's KMeans Clustering Algorithm. It is generally going fine. I want to use the Metrics package of NLTK to determine precision,recall and f measure. I searched for some examples in web and in other references but may be…

python machine-learning nltk cluster-analysis k-means

asked Mar 29 '16 at 17:42

Coeus2016

votes

1 answer

How to cluster a Time Series using DBSCAN python

So I have my data in the form of, X = [[T1],[T2]..] where Tn is the time series of nth user. I want to cluster these time series using the DBSCAN method using the scikit-learn library in python. When I try to directly fit the data, I get the output…

python cluster-analysis dbscan

asked Mar 27 '16 at 23:51

Siddharth Shah

votes

0 answers

Multiple Regression - cannot allocate vector of size 4.7gb

First of all I wanna say that I have no clue about R and coding itself. I just have to do a regression with clustered standard errors for my bachelor thesis and I can't do that in Excel. I managed to do the linear regression with clustered standard…

r memory-management cluster-analysis multiple-regression

asked Mar 18 '16 at 10:17

Copiloc

votes

1 answer

Clustering 1000 images to find group of images with greater similarity

I have 1000 of 2D gray-scale images and would like to cluster them in python in a way that images with more similarities stay in same group. The images represents simple geometrical shapes including circles, triangle etc. If I wan to flatten each…

image image-processing cluster-analysis

asked Mar 10 '16 at 18:57

S PA

votes

3 answers

Issue with nested calls with psexec (access denied)

First of all, sorry for my poor english. I would try to explain my problem. I am using psexec within a script to restart a cluster as follows: script1 in node1: perform a lot of tasks (shutdown services, check status, etc..) in the node1 and after…

scripting batch-file cluster-analysis psexec

asked Dec 11 '08 at 12:27

user41931

votes

2 answers

how to cluster evolving data streams

I want to incrementally cluster text documents reading them as data streams but there seems to be a problem. Most of the term weighting options are based on vector space model using TF-IDF as the weight of a feature. However, in our case IDF of an…

algorithm machine-learning cluster-analysis information-retrieval tf-idf

asked Aug 28 '10 at 08:09

user352951

Prev 1 2 3

…

99 100 Next