4

I have the code below:

import nltk
exampleArray = ['The dog barking']

def processLanguage():
    for item in exampleArray:
        tokenized = nltk.word_tokenize(item)
        tagged = nltk.pos_tag(tokenized)
        print(tagged)

processLanguage()

The output of the code above are the tokenized words with their corresponding parts of speech. Example :

[('The', 'DT'), ('dog', 'NN'), ('barking', 'NN'), ('.', '.')]

DT = determiner
NN = noun

The text is supposed to be

The dog is barking

and supposed to have the POS sequence of

DT -> NN -> VBZ -> VBG

VBZ = verb, present tense, 3rd person singular
VBG = verb, present participle or gerund

How will I make the program locate within the sentence the position of the missing word?

kunif
  • 4,060
  • 2
  • 10
  • 30
alyssaeliyah
  • 2,214
  • 6
  • 33
  • 80
  • It's not clear to me what you are trying to do. If you pass an incorrect sentence to a POS tagger, it is likely that you get incorrect labels back. – Bram Vanroy Mar 02 '20 at 09:32
  • 1
    @BramVanroy -> Maybe the right question is identify wrong sentence grammar? – alyssaeliyah Mar 02 '20 at 09:33
  • 1
    I fear that NLTK is not the right tool for the job. To do this well, you may want to look into things like https://www.aclweb.org/anthology/W19-4426/ – Bram Vanroy Mar 02 '20 at 09:42
  • I'd try the LanguageTool java app grammar checker, you can run it locally and connect over http from whatever language. Not exactly what you're asking for but might help solve the actual problem you're having – Kevin Mar 07 '20 at 23:11

1 Answers1

3

This is straight-foward grammar checking. You need at least a tagger, a tool which annotates part of speech tagging (POS), and a parser, best something like Early parser (https://en.wikipedia.org/wiki/Earley_parser) or something else, which is capable of analysing the tree structure given a phrase structure grammar (PSG) of your target language. Indifferent to what specific algorithm you choose, always keep in mind that natural language is at least weakly context-sensitive in the chosmky hierarchy, so forget about finite state automatons etc. If the parser does not validate your sentence as grammatical (in linguistic terms its not licensed by your PSG), you may use the tree structure to locate the position which is not employed or incorrectly employed by some terminal symbol. Another additional thing you have to do is morphological and case-marking, which allows for checking faults in agreement of verbs and arguments etc. in order to rule out sentences like "the dog are barking". Maybe also have a look at LFG or HPSG implementations, which realize this in a more thorough way, since they are computationally more powerful (context-sensitive tools, in other words a linear bounded turing machine).

CLpragmatics
  • 625
  • 6
  • 21