I'm a little bit confused how to determine part-of-speech tagging in English. In this case, I assume that one word in English has one type, for example word "book" is recognized as NOUN, not as VERB. I want to recognize English sentences based on tenses. For example, "I sent the book" is recognized as past tense.
Description:
I have a number of database (*.txt) files: NounList.txt, verbList.txt, adjectiveList.txt, adverbList.txt, conjunctionList.txt, prepositionList.txt, articleList.txt. And if input words are available in the database, I assume that type of those words can be concluded. But, how to begin lookup in the databases? For example, "I sent the book": how to begin a search in the databases for every word, "I" as Noun, "sent" as verb, "the" as article, "book" as noun? Any better approach than searching every word in every database? I doubt that every databases has unique element.
I enclose my perspective here.
private List<string> ParseInput(String allInput)
{
List<string> listSentence = new List<string>();
char[] delimiter = ".?!;".ToCharArray();
var sentences = allInput.Split(delimiter, StringSplitOptions.RemoveEmptyEntries).Select(s => s.Trim());
foreach (var s in sentences)
listSentence.Add(s);
return listSentence;
}
private void tenseReviewMenu_Click(object sender, EventArgs e)
{
string allInput = rtbInput.Text;
List<string> listWord = new List<string>();
List<string> listSentence = new List<string>();
HashSet<string> nounList = new HashSet<string>(getDBList("nounList.txt"));
HashSet<string> verbList = new HashSet<string>(getDBList("verbList.txt"));
HashSet<string> adjectiveList = new HashSet<string>(getDBList("adjectiveList.txt"));
HashSet<string> adverbList = new HashSet<string>(getDBList("adverbList.txt"));
char[] separator = new char[] { ' ', '\t', '\n', ',' etc... };
listSentence = ParseInput(allInput);
foreach (string sentence in listSentence)
{
foreach (string word in sentence.Split(separator))
if (word.Trim() != "")
listWord.Add(word);
}
string testPOS = "";
foreach (string word in listWord)
{
if (nounList.Contains(word.ToLowerInvariant()))
testPOS += "noun ";
else if (verbList.Contains(word.ToLowerInvariant()))
testPOS += "verb ";
else if (adjectiveList.Contains(word.ToLowerInvariant()))
testPOS += "adj ";
else if (adverbList.Contains(word.ToLowerInvariant()))
testPOS += "adv ";
}
tbTest.Text = testPOS;
}
POS tagging is my secondary explanation in my assignment. So I use a simple approach to determine POS tagging that is based on database. But, if there's a simpler approach: easy to use, easy to understand, easy to get pseudocode, easy to design... to determine POS tagging, please let me know.