I want to classify data into two classes based on parameters given. My data is publications from two different sources and I want to classify it into "match" or "non-match"; when comparing the dataset1 with dataset2. The datasets are unlabeled text data which contain five attributes (id, title, authors, venue, year) so if i apply unsupervised algorithms, it will not produce my target classes. On the other hand, supervised algorithms need to labelled data which is unavailable and time consumed.
- What is the best and easiest method to do that in python?