Text classification without machine learning

Question

I would like to match social media posts (short text) to a database of movies/TV shows. The database contains information on movie or TV show names, characters and actors. If enough evidence is found in the input text, then I want the algorithm to classify the text to the movie it belongs to, or do nothing if there is not enough evidence.

I'm familiar with machine learning approaches, but those require training samples and a finite number of categories. My algorithm should be able to use context and and be scale-able for new content. For example, I don't want the machine to learn to recognize "Harry Potter" movies but then fail to recognize "Fantastic beasts and where to find them" when that is released.

I understand that the solution to this is partial string matching, but I would like to be pointed in the right directing for some general guidelines on these sort of problems. I'm also interested in recognizing misspelled words and assigning more weight to certain matches and less to others.

Also, as a side note, should string matching be done through SQLite or outside it? My guess for this case would be outside, but I'd just like to make sure.

Thank you in advance for any help!

You could probably use the IBM AlchemyLanguage API. It can take text and break out concepts. For example "Love Robert Deniro in Heat" returns, Robert DeNiro, Heat, Al Pacino, and Michael Mann. 4 key components to identify the movie, with links to the database that it pulled it from. — Chris, Feb 13 '17 at 21:13
Thank you for the suggestion, I never thought I'd be working with Watson. I'll check it out right away. — humma4, Feb 14 '17 at 05:12

score 0 · Answer 1 · answered Jul 20 '17 at 13:42

What you are looking for is a fuzzy rule-based information retrieval system. It will require some hand crafted rules and fuzzy matching (usually using Lucene) to match queries against a knowledge base of entities/documents.

See this paper for an example:

Implementation of an efficient Fuzzy Logic based Information Retrieval System https://arxiv.org/pdf/1503.03957.pdf

Text classification without machine learning

1 Answers1