Questions tagged [unsupervised-learning]

Unsupervised learning refers to machine learning contexts in which there is no prior 'training' period in which the learning agent is trained on objects of known type. As such, supervised learning includes such disciplines as mathematical clustering, whereby data is segmented into clusters based on the minimisation or maximisation of mathematical properties and not on an attempt to classify by understanding the right context.

Unsupervised learning (or clustering) refers to machine learning algorithms in which there is no 'label' available for the training data and the model tries to learn the underlying manifold. As such, unsupervised learning includes such disciplines as mathematical clustering, whereby data is segmented into clusters based on the minimization or maximization of mathematical properties and not on an attempt to classify by understanding the right context.

618 questions
1
vote
0 answers

Comparison between different unsupervised learning algorithms

I am working on a project on binary classification, where I have to test several unsupervised learning algorithms, like: Isolation Forest, OneClassSVM, Local Outlier Factor and Elliptic Envelope. After fitting and predicting with those models I get…
1
vote
2 answers

PCA in R - Do we need to reassign the elements of "prcomp" by multiplying negative sign?

A trainer did this in a video. He just gave a quick explanation that he does this because of R's default nature. However, I have never seen this application before. Is it correct, and why he does this? pca <- prcomp(data, scale=TRUE) pca$rotation <-…
1
vote
1 answer

Multilingual free-text-items Text Classification for improving a recommender system

To improve the recomender system for Buyer Material Groups, our company is willing to train a model using customer historial spend data. The model should be trained on historical "Short text descriptions" to predict the appropriate BMG. The dataset…
1
vote
1 answer

sklearn matching results become misaligned when the data sets increase

I've been using sklearn NearestNeighbors to do name matching and at a certain point the results become misaligned. My standardized list of names is 100s of millions. My list of names coming in to be matched is considerably smaller but still could…
1
vote
2 answers

How to measure the accuracy of a Doc2vec model?

I have a dataset of reviews for different Hotels. I'm trying to find out similar hotels using the reviews of hotels. So, I'm using a Doc2vec algorithm to achieve this. Is there any way to measure the accuracy of a Doc2Vec model using Gensim, rather…
swetha
  • 29
  • 7
1
vote
1 answer

Best way to cluster long/lat hotspot points in one city in R?

I am new to R and (unsupervised) machine learning. I'm trying to find out the best cluster solution for my data in R. What is my data about? I have a dataset with +/- 800 long / lat WGS84 coordinates in one city. Long is in the range 6.90 -…
1
vote
0 answers

Improving accuracy of nearest neighbours algorithm - unsupervised learning problem

I have a situation where I am trying to find out 3 nearest neighbours for a given ID in my dataframe. I am using NN alogrithm (not KNN) to achieve this. The below code is giving me the three nearest neighbours, for the top node the results are fine…
1
vote
2 answers

Is Gradient Descent used during unsupervised training also?

Is Gradient Descent algorithm ever used during training of any unsupervised training like clustering, collaborative filtering, etc..?
Aman
  • 475
  • 2
  • 6
  • 10
1
vote
0 answers

How to retrain Inception V4 model by unsupervised learning?

I was trying to retrain Inception-V4 with an image set to unsupervised learning. First, I have read the pre-trained weights file for the Inception-V4. out = Dense(output_dim=nb_classes, activation='softmax')(x) model = Model(init, out,…
1
vote
0 answers

Unsupervised learning: Anomaly detection on discrete time series

I am working on a final year project on an unlabelled dataset consisting of vibration data from multiple components inside a wind turbine. Datasets: I have data from 4 wind turbines each consisting of 415 10-second intervals. About the 10 second…
1
vote
0 answers

2-dimensional clustering for segmentation and variance minimization

I have a dataset with two cardinal attributes of comparable scale. I wish to divide the data points into 4 clusters, so as to have complete segmentation by attribute 1, while minimizing the variance within attribute 2. E.g.: If plotting attribute 1…
1
vote
2 answers

Does correlation important factor in Unsupervised learning (Clustering)?

I am working with the dataset of size (500, 33). In particular the data set contains 9 features say [X_High, X_medium, X_low, Y_High, Y_medium, Y_low, Z_High, Z_medium, Z_low] Both visually & after correlation matrix calculation I observed that…
1
vote
0 answers

DBSCAN Number of Labels and Predicted Labels Aren't Match

I would like to cluster a dataset into 2 parts which are fraud and non-fraud. To do that I used DBSCAN however I received following error. "labels_true and labels_pred must have same size, got 7200 and 28789 " I would very pleased if you could help…
1
vote
0 answers

Unsupervised high dimension clustering

I have dataset of records where each record is with 5 labels and the importance of each label is different. I know to labels order according to importance but don't know the differences, so the difference between two records is look like: adist of…
Roy Ancri
  • 119
  • 2
  • 14
1
vote
1 answer

K-Means Result-Index differs in second run

I am running K-Means on some statistical Data. My Matrix size is [192x31634]. K-Means performs well and creates the amount of 7 centroids, that I want it to. So my Result is [192x7] As some self-check I store the index-Values I obtain in the…
hoglimo
  • 51
  • 7