Preprocessing data in Multi-label classification Python

Question

My dataset structure:

Text: 'Good service, nice view, location'
Tag: '{SERVICE#GENERAL, positive}, {HOTEL#GENERAL, positive}, {LOCATI
ON#GENERAL, positive}'

And the point here is that I don't know how can I structure my data frame. If you have any recommendations, these will be really nice to me. Thank you.

I am assuming that you have 3 attributes to classify: SERVICE, HOTEL, LOCATION is that correct or there are more options? — Antoan Milkov, Jun 22 '19 at 08:49
They also got Room, Food&Drink, Facilities ,.. I did not know extactly how much they were because of lacking information how did they structured their database, I just pointed you some others i found in their supplying database. — Lắc Lê, Jun 22 '19 at 08:54

score 0 · Answer 1 · answered Oct 08 '19 at 05:07

Separate adjectives (good, bad, etc) from the hotel attributes (service, view, location). You can start from creating a custom dictionary and automatically detect and leverage new words as categories. You could use some name entity recognition to do so, here some articles:

https://towardsdatascience.com/named-entity-recognition-with-nltk-and-spacy-8c4a7d88e7da https://towardsdatascience.com/a-review-of-named-entity-recognition-ner-using-automatic-summarization-of-resumes-5248a75de175

Personally I have used the standford one, pretty cool

Preprocessing data in Multi-label classification Python

1 Answers1