I am trying to create a prediction model (or classification) for a dataset which includes numeric and text features
Using Tf-IdfVectorizer, I have managed to convert text columns into lists
so each cell in the text column is a list of float numbers such as
[0.0 0.3567 0.0 0.0]
(without commas).
my target feature is a set of classes. each row can have multiple values such as
[a, b, c, 1]
[1, d]
[]
the question is how can pre-process the target variable so that my model makes classification predictions? I have tried label encoding but it creates new encoding for each row so same integer is encoded to different classes at different rows.
I am planning to accept all the predictions for each row over a certain threshold. Is there a model also supporting this ? Many thanks in advance