I've recently starting looking at data extraction using NLTK. While there are several examples and techniques for detecting "real" names, locations, etc.. I haven't found an efficient way to detect "made up" or "imaginary" names. An example string would be:
His name is wuzzywugg and he has a dog named fizzbuzz
I would like to train NLTK to be able to detect that "wuzzywugg" and "fizzbuzz" are names of characters. Seen some solutions that rely on the word starting with a CAPITAL letter, but this feels very "hacky" and prone to errors and false positives.
Any help on how to solve this issue would be greatly appreciated. Thanks in advance.