Questions tagged [unsupervised-learning]

Unsupervised learning refers to machine learning contexts in which there is no prior 'training' period in which the learning agent is trained on objects of known type. As such, supervised learning includes such disciplines as mathematical clustering, whereby data is segmented into clusters based on the minimisation or maximisation of mathematical properties and not on an attempt to classify by understanding the right context.

Unsupervised learning (or clustering) refers to machine learning algorithms in which there is no 'label' available for the training data and the model tries to learn the underlying manifold. As such, unsupervised learning includes such disciplines as mathematical clustering, whereby data is segmented into clusters based on the minimization or maximization of mathematical properties and not on an attempt to classify by understanding the right context.

618 questions
6
votes
3 answers

When to use supervised or unsupervised learning?

Which are the fundamental criterias for using supervised or unsupervised learning? When is one better than the other? Is there specific cases when you can only use one of them? Thanks
6
votes
1 answer

Affinity propagation preference parameter

I've had encouraging results clustering a set of entity names using scikit-learn's affinity propagation implementation, with a modified Jaro-Winkler distance as the similarity metric, but my clusters are still too numerous (ie. too many false…
nitrl
  • 2,185
  • 2
  • 15
  • 15
6
votes
1 answer

unsupervised semantic clustering of phrases

I have about a thousand potential survey items as a vector of strings that I want to reduce to a few hundred. Normally when we talk about data reduction, we have actual data. I administer the items to participants and use factor analysis, PCA, or…
Eric Green
  • 7,385
  • 11
  • 56
  • 102
6
votes
1 answer

Drawing clustered graphs in Python

I already have a way of clustering my graph, so the process of clustering isn't the issue here. What I want to do is, once we have all the nodes clustered - to draw the clustered graph in Python, something like this: I looked into networkx, igraph…
6
votes
1 answer

principal component analysis (PCA) in R: which function to use?

Can anyone explain what the major differences between the prcomp and princomp functions are? Is there any particular reason why I should choose one over the other? In case this is relevant, the type of application I am looking at is a quality…
AndraD
  • 2,830
  • 6
  • 38
  • 48
6
votes
1 answer

Semi-supervised Naive Bayes with NLTK

I have built a semi-supervised version of NLTK's Naive Bayes in Python based on the EM (expectation-maximization algorithm). However, in some iterations of EM I am getting negative log-likelihoods (the log-likelihoods of EM must be positive in every…
5
votes
1 answer

replace the silhouette with the Inertia

I have a problem. I am working with k-means and would like to find the optimal cluster. Unfortunately, my data set is too large to apply silhouette . Is there an option to adapt this code and replace the silhouette with the Inertia? MVC from…
5
votes
1 answer

Why grpreg library and gglasso library in R are giving different results for group LASSO?

I have been trying to do unsupervised feature selection using LASSO (by removing class column). The dataset includes categorical (factor) and continuous (numeric) variables. Here is the link. I built a design matrix using model.matrix() which…
5
votes
2 answers

Clustering images based on their similarity

I am facing a problem of image clustering based on their similarity, without knowing the number of clusters. Ideally i would like to achieve something that resembles this http://cs231n.github.io/assets/cnnvis/tsne.jpeg…
5
votes
1 answer

Passing Target/Label data to Scikit-learn GridSearchCV's fit method for OneClassSVM

From my understanding, One-Class SVM's are trained without target/label data. One answer at Use of OneClassSVM with GridSearchCV suggests passing Target/Label data to GridSearchCV's fit method when the classifier is the OneClassSVM. How does the…
5
votes
1 answer

How to get nearest neighbours in fasttext for unsupervised learning models (cbow, skipgram)?

The examples (related to word representations) on fasttext official web site (fasttext.cc) suggest that it is possible to calculate the nearest neighbors on vectors derived with cbow (or skip-gram model) (in short, on unsupervised learning models).…
5
votes
1 answer

BERT performing worse than word2vec

I am trying to use BERT for a document ranking problem. My task is pretty straightforward. I have to do a similarity ranking for an input document. The only issue here is that I don’t have labels - so it’s more of a qualitative analysis. I am on my…
5
votes
2 answers

How to programmatically determine the column indices of principal components using FactoMineR package?

Given a data frame containing mixed variables (i.e. both categorical and continuous) like, digits = 0:9 # set seed for reproducibility set.seed(17) # function to create random string createRandString <- function(n = 5000) { a <- do.call(paste0,…
mnm
  • 1,962
  • 4
  • 19
  • 46
5
votes
1 answer

Implementation of Excess-Mass or Mass-Volume curves

I am looking for an implementation of Excess-Mass or Mass-Volume curves which are used for the evaluation of unsupervised anomaly detection algorithms. I'd prefer an implementation in Python but I could re-write it from any other language. Thank…
5
votes
2 answers

Unsupervised loss function in Keras

Is there any way in Keras to specify a loss function which does not need to be passed target data? I attempted to specify a loss function which omitted the y_true parameter like so: def custom_loss(y_pred): But I got the following error: Traceback…
Nick Bishop
  • 391
  • 1
  • 4
  • 13
1 2
3
41 42