I would like to match social media posts (short text) to a database of movies/TV shows. The database contains information on movie or TV show names, characters and actors. If enough evidence is found in the input text, then I want the algorithm to classify the text to the movie it belongs to, or do nothing if there is not enough evidence.
I'm familiar with machine learning approaches, but those require training samples and a finite number of categories. My algorithm should be able to use context and and be scale-able for new content. For example, I don't want the machine to learn to recognize "Harry Potter" movies but then fail to recognize "Fantastic beasts and where to find them" when that is released.
I understand that the solution to this is partial string matching, but I would like to be pointed in the right directing for some general guidelines on these sort of problems. I'm also interested in recognizing misspelled words and assigning more weight to certain matches and less to others.
Also, as a side note, should string matching be done through SQLite or outside it? My guess for this case would be outside, but I'd just like to make sure.
Thank you in advance for any help!