How to train a naive bayes classifier with pos-tag sequence as a feature?

Question

I have two classes of sentences. Each has reasonably distinct pos-tag sequence. How can I train a Naive-Bayes classifier with POS-Tag sequence as a feature? Does Stanford CoreNLP/NLTK (Java or Python) provide any method for building a classifier with pos-tag as a feature? I know in python NaiveBayesClassifier allows for building a NB classifier but it uses contains-a-word as feature but can it be extended to use pos-tag-sequence as a feature ?

Do you really need to use NaiveBayesClassifier? Have you looked at CRF? By the way, have you read this chapter: http://www.nltk.org/book/ch06.html ? — 404pio, Feb 27 '15 at 16:13
Thanks for the link. I ended up using concatenated pos-tags-sequence and `containsPosSequence` as a feature... — kundan, Mar 04 '15 at 11:00

score 6 · Accepted Answer · answered Feb 28 '15 at 15:51

If you know how to train and predict texts (or sentences in your case) using nltk's naive bayes classifier and words as features, than you can easily extend this approach in order to classify texts by pos-tags. This is because the classifier don't care about whether your feature-strings are words or tags. So you can simply replace the words of your sentences by pos-tags using for example nltk's standard pos tagger:

sent = ['So', 'they', 'have', 'internet', 'on', 'computers' , 'now']
tags = [t for w, t in nltk.pos_tag(sent)]
print tags

['IN', 'PRP', 'VBP', 'JJ', 'IN', 'NNS', 'RB']

As from now you can proceed with the "contains-a-word" approach.

Adding to your answer, as the question says "sequence", we can chain the pos tags of the sentence like `[IN][PRP][VBP][JJ][IN][NNS][RS]` and define a feature like say `conatinsPrpVbpSequence` and set it to `True` for an occurance of `[PRP][VBP]`.... — kundan, Mar 05 '15 at 06:22

How to train a naive bayes classifier with pos-tag sequence as a feature?

1 Answers1