-1

I have data of advertisements posted on a secondhand site to sell used smartphones. Each ad describes the product that is being sold. I want to know which parameters are most often described by sellers. For example: brand, model, colour, memory capacity, ...

By text mining all the text from the advertisements I would like to bundle similar words together in 1 category. For example: black, white, red, ... should be linked to each other as they all describe the colour of the phone.

Can this be done with clustering or categorisation and which text mining algorithms are equipped to do this?

  • If you have relatively few bundles, you might create them manually and convert terms into the same word. So as a simple example, color <- c("red", "green", "blue") and then ifelse(wordstring %in% color, "shade", wordstring). Shade would be your bundle. – lawyeR Apr 01 '19 at 10:24
  • You can watch this https://www.youtube.com/watch?v=4vuw0AsHeGw&list=PL8eNk_zTBST8olxIRFoo0YeXxEOkYdoxi –  Apr 01 '19 at 17:54

1 Answers1

0

Your best attempt is something based on word2vec.

Clustering algorithms will not be able to discover the humang language concept of color reliably. So either you choose some supervised approach, or you need to try methods to first infere SUV concepts.

Word2vec is trained on substitutability of words. As in a sentence such as "I like the red color" you can substitute red with other colors, word2vec could theoretically be able to help with finding such concepts in an unsupervised way, given lots and lots of data. But I'm sure you can also find counterexamples that break these concepts... Good luck... I doubt you'll manage to do this unsupervised.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194