I have a list of tuples that are generated from a string using NLTK's PoS tagger.
I'm trying to find the the "intent" of a specific string in order to append it to a dataframe, so I need a way to generate a syntax/grammar rule.
string = "RED WHITE AND BLUE"
string_list = nltk.pos_tag(a.split())
string_list = [('RED', 'JJ'), ('WHITE', 'NNP'), ('AND', 'NNP'), ('BLUE', 'NNP')]
The strings vary in size, from 2-3 elements all the way to full on paragraphs (40-50+) so I'm wondering if there is a general form or rule that I can create in order to parse a sentence.
So if I want find a pattern in a list an example pseudocode output would be:
string_pattern = "I want to kill all the bad guys in the Halo Game"
pattern = ('I', 'PRP') + ('want', 'VBP') + ('to', 'TO') + ('kill:', 'JJ') + ('all', 'DT') + ('bad', 'JJ') + ('guys', 'NNS') + ('in', 'IN') + ('Halo', 'NN') + ('Game', 'NN')
Ideally I would be able to match part of the pattern in a tagged string, so it finds:
('I', 'PRP') + ('want', 'VBP') + ('to', 'TO') + ('kill:', 'JJ')
but it doesn't need the rest, or vice versa it can find multiple examples of the pattern in the same string, in the event that the string is a paragraph. If anyone knows the best way to do this or knows a better alternative it would be really helpful!