0

I would like to determine for a given URL as an input a category from a list of categories (e.g. programming, health, vegan food, computer science, math).

cats = [ "programming", "health", "raw vegan food", "vegan cooking", "computer science", "math" ]

def getCategory(url, cats):
  ...

I would like to do it, without having to download a lot of data for deriving the category. I have already searched a lot of what is available, but I'm starting to information overload, losing in lot of data about NLP, topic modeling.

I have found gensim library, but not sure if it's able to do this conversion. So, if you can provide certain direction, it would be really helpful.

xralf
  • 3,312
  • 45
  • 129
  • 200
  • You will have to use topic modeling to solve this. I recommend you check out the NLTK library – lordingtar Jan 26 '17 at 23:28
  • Maybe I'm missing something, but how would you ever be able to determine what category *www.myrandomwebsite.com* is in? – ChrisW Jan 26 '17 at 23:29
  • 1
    You won't be able to do so reliably, but topic modeling is a good place to start if you want somewhat accurate classification. – lordingtar Jan 26 '17 at 23:40
  • @lordingtar Thanks, this is good to know, if it has sense to dive into studying it now. Maybe I will close it as too general. I wanted something that is able to recognize subtle differences. between "raw vegan food" and "vegan cooking" but it looks like information overload will be here for some time yet. – xralf Jan 26 '17 at 23:51
  • Use a neural network! Nerual nets are great and does all things ! :D – theonlygusti Jan 26 '17 at 23:51

0 Answers0