Questions tagged [unsupervised-learning]

Unsupervised learning refers to machine learning contexts in which there is no prior 'training' period in which the learning agent is trained on objects of known type. As such, supervised learning includes such disciplines as mathematical clustering, whereby data is segmented into clusters based on the minimisation or maximisation of mathematical properties and not on an attempt to classify by understanding the right context.

Unsupervised learning (or clustering) refers to machine learning algorithms in which there is no 'label' available for the training data and the model tries to learn the underlying manifold. As such, unsupervised learning includes such disciplines as mathematical clustering, whereby data is segmented into clusters based on the minimization or maximization of mathematical properties and not on an attempt to classify by understanding the right context.

618 questions
10
votes
2 answers

What is the relation between topic modeling and document clustering?

Topic modeling identifies distribution of topics in a document collection, which effectively identifies the clusters in the collection. So is it right to say that topic modeling is a technique to do document clustering?
afs
  • 167
  • 1
  • 9
9
votes
3 answers

Rand Index function (clustering performance evaluation)

As far as I know, there is no package available for Rand Index in python while for Adjusted Rand Index you have the option of using sklearn.metrics.adjusted_rand_score(labels_true, labels_pred). I wrote the code for Rand Score and I am going to…
Hadij
  • 3,661
  • 5
  • 26
  • 48
8
votes
3 answers

Clustering images using unsupervised Machine Learning

I have a database of images that contains identity cards, bills and passports. I want to classify these images into different groups (i.e identity cards, bills and passports). As I read about that, one of the ways to do this task is clustering…
8
votes
2 answers

Choosing the number of clusters in heirarchical agglomerative clustering with scikit

The wikipedia article on determining the number of clusters in a dataset indicated that I do not need to worry about such a problem when using hierarchical clustering. However when I tried to use scikit-learn's agglomerative clustering I see that I…
8
votes
2 answers

Why isn't DropOut used in Unsupervised Learning?

All or nearly all of the papers using dropout are using it for supervised learning. It seems that it could just as easily be used to regularize deep autoencoders, RBMs and DBNs. So why isn't dropout used in unsupervised learning?
7
votes
1 answer

Online clustering of news articles

Is there a common online algorithm to classify news dynamically? I have a huge data set of news classified by topics. I consider each of that topics a cluster. Now I need to classify breaking news. Probably, I will need to generate new topics, or…
7
votes
1 answer

Pattern Detection in Time Series Data

I have a data frame representing a time series like for example: timestamp: 1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28... value: 0|0|3|6|3|3|6|3|3|6 |3 |0 |0 |0 |1 |3 |7 |0 |0 |1 |3 |7 |1 |3 |7 |3 |6 |3 ... The goal…
7
votes
1 answer

Hidden Markov Model: Is it possible that the accuracy decreases as the number of states increases?

I constructed a couple of Hidden Markov Models using the Baum-Welch algorithm for an increasing number of states. I noticed that after 8 states, the validation score goes down for more than 8 states. So I wondered whether it's possible that the…
7
votes
1 answer

How can I speed up a topic model in R?

Background I am trying to fit a topic model with the following data and specification documents=140 000, words = 3000, and topics = 15. I am using the package topicmodels in R (3.1.2) on a Windows 7 machine (ram 24 GB, 8 cores). My problem is that…
7
votes
2 answers

How can we use unsupervised learning techniques on a data-set, and then label the clusters?

First up, this is most certainly homework (so no full code samples please). That said... I need to test an unsupervised algorithm next to a supervised algorithm, using the Neural Network toolbox in Matlab. The data set is the UCI Artificial…
6
votes
2 answers

Kmeans using categorical variables

I have a large data set 45421 * 12 (rows * columns) which contains all categorical variables. There are no numerical variables in my dataset. I would like to use this dataset to build unsupervised clustering model, but before modeling I would like…
6
votes
3 answers

Which algorithm and what combination of hyper-parameters will be the best to cluster this data?

I was learning about non-linear clustering algorithms and I came across this 2-D graph. I was wondering which clustering alogirthm and combination of hyper-parameters will cluster this data well. Just like a human will cluster those 5 spikes. I…
6
votes
1 answer

How to build an unsupervised CNN model with keras/tensorflow?

I'm trying to build a CNN for an image-to-image translation application, the input of the model is an image, and the output is a confidence map. There are no labeled confidence as the ground truth during training, but a loss function is designed to…
Jemma
  • 95
  • 1
  • 6
6
votes
2 answers

Custom Hebbian Layer Implementation in Keras - input/output dims and lateral node connections

I'm trying to implement an unsupervised ANN using Hebbian updating in Keras. I found a custom Hebbian layer made by Dan Saunders here - https://github.com/djsaunde/rinns_python/blob/master/hebbian/hebbian.py (I hope it is not poor form to ask…
6
votes
1 answer

Find length of cluster (how many point associated with cluster) after KMeans clustering (scikit learn)

I have done clustering using Kmeans using sklearn. While it has a method to print the centroids, I am finding it rather bizzare that scikit-learn doesn't have a method to find out the cluster length (or that I have not seen it so far). Is there a…
1
2
3
41 42