So i have a situation that i couldnt get out. Im pretty new to machine learning and its community. Im trying to make a classification model but here is my problem:
So lets say i have 2 of X (variables; text or integers) columns and 1 Y (which im trying to predict) column.
One of these X columns originated from a dataset that has duplicate rows but some of the information in duplicates are different and important for my work.
Let me try to make an example;
Product No Variable 1 Y
1 apple result1
2 orange result2
3 banana, apple result1
4 bluebarry result3
5 banana result5
So as you can see in row 3 there are two information that has a value to me. How can i handle this situation in a classifaction model? Sorry if its obvious. Im new to ML :)
Edit Note: that variable 1 column has huge data and approximately thousand different information. I dont have 1 variable at my model ofc. the real model is really high dimensioned already.