ML classification, how to handle a cell that has 2 information?

Asked Mar 01 '19 at 19:18

Active Mar 01 '19 at 19:43

Viewed 53 times

So i have a situation that i couldnt get out. Im pretty new to machine learning and its community. Im trying to make a classification model but here is my problem:

So lets say i have 2 of X (variables; text or integers) columns and 1 Y (which im trying to predict) column.

One of these X columns originated from a dataset that has duplicate rows but some of the information in duplicates are different and important for my work.

Let me try to make an example;

Product No Variable 1 Y
1 apple result1
2 orange result2
3 banana, apple result1
4 bluebarry result3
5 banana result5

So as you can see in row 3 there are two information that has a value to me. How can i handle this situation in a classifaction model? Sorry if its obvious. Im new to ML :)

Edit Note: that variable 1 column has huge data and approximately thousand different information. I dont have 1 variable at my model ofc. the real model is really high dimensioned already.

edited Mar 01 '19 at 19:43

asked Mar 01 '19 at 19:18

Oğuzhan Alptekin

This is a multi-label classification situation where it's possible to have one observation with several output classes. Try encoding variable 1 as columns of unique values (apple, orange, banana, blueberry), and product no.3 would be (1,0,1,0) in this case. – Xiaoyu Lu Mar 01 '19 at 19:30
Yeah i got the same idea but i have thousands of different data in that variable1 column. So i have to make additional thousand columns for that which really wont come as a solution to my already too much dimensioned model :) Thanks tho. – Oğuzhan Alptekin Mar 01 '19 at 19:33
You need to apply dimensionality reduction technique. – singhV Mar 01 '19 at 22:50

ML classification, how to handle a cell that has 2 information?

0 Answers0