0

I have a dataset of texts, each text was identified with an ID number. I would like to do a prediction by finding the best match ID number for upcoming new texts. To use multi text classification, I am not sure if this is the right approach since there is only one text for most of ID numbers. In this case, I wouldn't have any test set. Can up-sampling help? Or is there any other approach than classification for such a problem?

The data set looks like this:

id1 'text1', id2 'text2', id3 'text3', id3 'text4', id3 'text5', id4 'text6', . . id200 'text170'

I would appreciate any guidance to find the best approach for this problem.

STA
  • 30,729
  • 8
  • 45
  • 59
Fara
  • 1

0 Answers0