I'm working on a project where I am trying to build a named entity recognizer from texts. So basically I want to build and experiment the NER in 3 different ways.
First, I want to build it using only segmented sentences-> tokenized words. To clarify, I want to input only split/tokenized words into the system. Once again, the NER system is rule-based. Hence, it can only use rules to conclude which is a named entity. In the first NER, it will not have any chunk information or part of speech label. Just the tokenized words. Here, the efficiency is not the concern. Rather the concern lies in comparing the 3 different NERs, how they perform. (The one I am asking about is the 1st one).
I thought of it for a while and could not figure out any rules or any idea of coming up with a solution to this problem. One naive approach would be to conclude all words beginning with an uppercase and that does not follow a period to be a named entity.
Am I missing anything? Any heads up or guidelines would help.