Questions tagged [unsupervised-learning]

Unsupervised learning refers to machine learning contexts in which there is no prior 'training' period in which the learning agent is trained on objects of known type. As such, supervised learning includes such disciplines as mathematical clustering, whereby data is segmented into clusters based on the minimisation or maximisation of mathematical properties and not on an attempt to classify by understanding the right context.

Unsupervised learning (or clustering) refers to machine learning algorithms in which there is no 'label' available for the training data and the model tries to learn the underlying manifold. As such, unsupervised learning includes such disciplines as mathematical clustering, whereby data is segmented into clusters based on the minimization or maximization of mathematical properties and not on an attempt to classify by understanding the right context.

618 questions
-1
votes
1 answer

How to come up with questions on supervised and unsupervised learning?

I am new to data mining concepts and trying to learn the differences between supervised and unsupervised learning. So far what i know is that supervised means getting the information from labeled datasets and unsupervised means clustering the data…
-1
votes
1 answer

python unsupervised learning dbscan scikit application example

I have following list that I would like to perform unsupervised learning on and use the knowledge to predict a value for each item in the test list #Format [real_runtime, processors, requested_time, score, more_to_be_added] #some entries from the…
-1
votes
1 answer

how to do clustering when the input is 3D matrix, MATLAB

i am having 3D matrix in which most of the values are zeros but there are some nonzeros values. when I am plotting this 3D matrix in matlab I am getting plot like as below here u can see there are two groups of points are nearer to each other(that's…
-1
votes
1 answer

which is the right learning algorithm, k-means?

I am working on a basic decision making algorithm, i.e. based on the time of a parallel loop iteration, a decision is made to either increase or decrease the amount of threads assigned to a process. My initial approach was to take the average time…
-1
votes
2 answers

Robust Clustering Algorithm

Say I have items i1, ..., iN I would like to cluster them in such a way that: If I ran the cluster many many times the probability that items iJ and iK would end up in the same cluster is high. The number of clusters and cluster memberships are…
user1172468
  • 5,306
  • 6
  • 35
  • 62
-1
votes
1 answer

EM soft clustering in lingpipe

In Lingpipe's EM tutorial they said that it is possible to run the algorithm with no supervised data: It is possible to train a classifier in a completely unsupervised fashion by having the initial classifier assign categories at random. Only the…
Tuan Anh Hoang-Vu
  • 1,994
  • 1
  • 21
  • 32
-1
votes
2 answers

should I use mahout for this?

I want to recommend items that are tagged and are categorized into three price categories (cheap, regular and expensive). I know that with Mahout recommendation could be achieved but here's why I don't know how to use it. Mahout is based on the…
Javier Manzano
  • 4,761
  • 16
  • 56
  • 86
-2
votes
0 answers

How to choose K in K mean clustering use AIC and BIC method from this graph?

I have to identify k to determine group from Mall_Customers.csv have 2 variable are Spending Score (1-100) and Annual Income (k$) by use AIC and BIC score method enter image description here I want a theory explain why choose that K and principle to…
-2
votes
1 answer

Unsupervised Learning for regression analysis

I am a geophysics student and I am trying to predict shear wave velocity which is numerical data. I feel since it is numerical data it'd be regression analysis but the problem I have now is that I don't have a shear wave log I can use as a target…
-2
votes
1 answer

Find similarity between rows of a dataframe in Python

For Example in one classification problem's dataset we have 50 categories so it will be difficult for model to predict these many classes. So to avoid this i want to combine target variable's rows which are having similar kind of feature…
-2
votes
1 answer

Which unsupervised algorithm we can use to detect anomaly in transaction data?

I was trying to look for an algorithm which can be used to find anomalies in transaction data which also contains Timestamp as one of the columns. I tried using Isolation forest but I think it's not possible to use it with the DateTime column Or is…
-2
votes
1 answer

HBSCAN membership probability

I'm working on a comparison between clustring algorithms and I want to know how HDBSCAN in R calculate the so called the membership 'probability' ?
-2
votes
1 answer

How to check the accuracy of k-means clustering in python? How to know what the predicted variables represent in k-means algorithm?

The above dataframe represents the attributes to determine Whether I have cancer or not. The class represents whther the person has cancer or not. Class-2 shows the person donot have cancer, and 4 represents person has cancer. When I try K-means on…
-2
votes
1 answer

Unlabeled text data containing messages

I am working on a text dataset containing messages from users on a website. Please check the image in the link as stack is not allowing me to post this image directly. dataframe of the first five rows Reading those messages i want to find out the…
-2
votes
1 answer

how to do clustering when the shape of data is (x,y,z)?

suppose i have 10 individual observations each of size (125,59). i want to group these 10 observations based on their 2d feature matrices ((125,59)).Is this possible without flattening every observation to 125*59 1D matrix ? I cant even implement…