I was trying to implement a regex on a list of grammar tags in python, for finding the tense form of the list of grammar. And I wrote the following code to implement it.
Data preprocessing:
from nltk import word_tokenize, pos_tag
import nltk
text = "He will have been doing his homework."
tokenized = word_tokenize(text)
tagged = pos_tag(tokenized)
tags = []
for i in range(len(tagged)):
t = tagged[i]
tags.append(t[1])
print(tags)
regex formula i.e. to be implemented
grammar = r"""
Future_Perfect_Continuous: {<MD><VB><VBN><VBG>}
Future_Continuous: {<MD><VB><VBG>}
Future_Perfect: {<MD><VB><VBN>}
Past_Perfect_Continuous: {<VBD><VBN><VBG>}
Present_Perfect_Continuous:{<VBP|VBZ><VBN><VBG>}
Future_Indefinite: {<MD><VB>}
Past_Continuous: {<VBD><VBG>}
Past_Perfect: {<VBD><VBN>}
Present_Continuous: {<VBZ|VBP><VBG>}
Present_Perfect: {<VBZ|VBP><VBN>}
Past_Indefinite: {<VBD>}
Present_Indefinite: {<VBZ>|<VBP>}
Function to implement the regex on the list tags
def check_grammar(grammar, tags):
cp = nltk.RegexpParser(grammar)
result = cp.parse(tags)
print(result)
result.draw()
check_grammar(grammar, tags)
But it returned an error as:
Traceback (most recent call last):
File "/home/samar/Desktop/twitter_tense/main.py", line 35, in <module>
check_grammar(grammar, tags)
File "/home/samar/Desktop/twitter_tense/main.py", line 31, in check_grammar
result = cp.parse(tags)
File "/home/samar/.local/lib/python3.8/site-packages/nltk/chunk/regexp.py", line 1276, in parse
chunk_struct = parser.parse(chunk_struct, trace=trace)
File "/home/samar/.local/lib/python3.8/site-packages/nltk/chunk/regexp.py", line 1083, in parse
chunkstr = ChunkString(chunk_struct)
File "/home/samar/.local/lib/python3.8/site-packages/nltk/chunk/regexp.py", line 95, in __init__
tags = [self._tag(tok) for tok in self._pieces]
File "/home/samar/.local/lib/python3.8/site-packages/nltk/chunk/regexp.py", line 95, in <listcomp>
tags = [self._tag(tok) for tok in self._pieces]
File "/home/samar/.local/lib/python3.8/site-packages/nltk/chunk/regexp.py", line 105, in _tag
raise ValueError("chunk structures must contain tagged " "tokens or trees")
ValueError: chunk structures must contain tagged tokens or trees