Questions tagged [unsupervised-learning]

Unsupervised learning refers to machine learning contexts in which there is no prior 'training' period in which the learning agent is trained on objects of known type. As such, supervised learning includes such disciplines as mathematical clustering, whereby data is segmented into clusters based on the minimisation or maximisation of mathematical properties and not on an attempt to classify by understanding the right context.

Unsupervised learning (or clustering) refers to machine learning algorithms in which there is no 'label' available for the training data and the model tries to learn the underlying manifold. As such, unsupervised learning includes such disciplines as mathematical clustering, whereby data is segmented into clusters based on the minimization or maximization of mathematical properties and not on an attempt to classify by understanding the right context.

618 questions
1
vote
1 answer

How to generate a new column based on some other column after clustering the data?

I have a dataframe like this with columns - ["A","B","C",D"] A --> Categorical feature with 2 values, say Yes or No B --> Categorical feature with 10 unique values, like "AAXX-10","BBYY-20" etc C --> A date-time field D --> Text-based column,…
1
vote
1 answer

Making bar plot of different clusters

I am currently learning K-means, so now I am writing a program in Python to determine different clusters of text that are similar to each other. So now I got the results for two different clusters (using some fictional words but everything else is…
1
vote
1 answer

Can agglomerative clustering and divisive clustering get the same result in the end?

Now, I know agglomerative is bottom-up method, whereas divisive is top-dowm method. However, I don't know what different between them in specific process. For example, do they all use proximity matrix to calculate any pairwise distance between…
1
vote
2 answers

Dendrogram: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

I am trying to plot Dendrogram to cluster data but this error is stopping me. my date is here "https://assets.datacamp.com/production/repositories/655/datasets/2a1f3ab7bcc76eef1b8e1eb29afbd54c4ebf86f2/eurovision-2016.csv" I first chose columns to…
1
vote
1 answer

Categorical Embeddings in an Unsupervised Setting for Anomaly Detection

Context: I am working on an unsupervised use case. The Dataset I have has the following fields: TimeStamp, UserName and eventName Eg. User A has done Event B at Timestamp C My objective is to perform an anomaly detection task. i.e. if UserA performs…
1
vote
1 answer

Reinforcement learning for home automation

I have got a problem where I have to automate some task. Let's say switching on and off an appliance based on the user interaction with it on different days of the week. Let's say we have a bulb B1 and a user U1. In beginning U1 will switch on and…
1
vote
1 answer

HDBSCAN Cluster choice

I have been working with HDBSCAN and have a few hundreds of clusters based on my data. I am trying to select some cluster groups for further analysis. Looking for the clusters which have high inter-cluster-distance, as in more spread out and behave…
Jazz
  • 445
  • 2
  • 7
  • 22
1
vote
1 answer

unsupervised ML clasifier for car price prediction

I have a dataset from a car company. The dataset consists of multiple features. I want to predict the price of cars through unsupervised ML classifiers. I am not familiar with what classifier I use to predict the class label.
1
vote
1 answer

Labels obtained from clustering seem visually incorrect

I have the following distance matrix based on 10 datapoints: import numpy as np distance_matrix = np.array([[0. , 0.00981376, 0.0698306 , 0.01313118, 0.05344448, 0.0085152 , 0.01996724, 0.14019663, 0.03702411,…
1
vote
2 answers

outlier detection using 2D spatial information

I have a list of sensor measurements for air quality with geo-coordinates, and I would like to implement outlier detection. The list of sensors is relatively small (~50). The air quality can gradually change with the distance, but abrupt local…
krokodil
  • 1,326
  • 10
  • 18
1
vote
0 answers

How can I use PyMC3 to estimate a discrete value A over which a for loop must be constructed?

This is a properly articulated version of an old question. I'm working on condensing some code written for a neuroscience paper: https://doi.org/10.1371/journal.pcbi.1007481. Without delving into unnecessary detail, here are some…
1
vote
0 answers

Why k-means clustering give me different answers when initialized with different centroids?

I followed the pseudo code for k-means clustering to write this code. This code gives different answers when initialized the clusters' centroids with different values and none of those answers are correct. Can you help me please? I tested with 15…
1
vote
2 answers

How to use silhouette_score in Sklearn with mixed (categorical and numerical) data?

I have come to a situation where I have mixed data set as mentioned and try unsupervised clustering. I am trying many different experiments including Gower's distance and K-prototype. I wanna try some of sklearn metrics to see how they will give me…
1
vote
1 answer

Understanding ConvNet Prediction on Text Classification

I'm trying to debug a model that uses 1D convolutions to classify text that was labeled by humans as being "appropriate" vs "not appropriate" to be posted on some website. Looking at false positives (wrongly predicted "appropriate"), I see that the…
1
vote
1 answer

Market Basket Analysis taking quantity in account

I am using apriori algorithm to extract frequent itemsets and then performing a Market Basket Analysis. As an input for the apriori, I have to perform a one_hot encoding on my dataset: the quantities are ignored. Is there a way to perform Market…