Weka POS tagging + tokenization

Question

I'm new to Weka. I am trying to sentimental classify movie reviews. The thing is, I can understand the StringToWord Vector which tokenizes and attributes the word occurrences. I want to add the Parts Of Speech tags also to the attribute vocabulary but I am getting stuck to how?

Has anyone tried this before?

Please, can you guide me?

P.S . I am using OpenNLP for POS tagging and Weka J48 classifier !!

Yup, I did . I used TextDirectoryLoader class for accessing my data in instances format and StringToWordVector or tokenization . Now , I cannot understand how to add POS tags for each tokenized attribute ? I also tried counting word occurences by my own and created an ARFF file on my own but it gave me error IOException premature end of line ... — Harish Gontu, Jul 01 '16 at 12:11

score 0 · Answer 1 · answered Jul 15 '16 at 00:51

Trial and error approach:

Do something like write the POStagged data into a text file and then do the word2vec. Then check the distance between a word and a POStag, nearest one is it's POS?

Then there would be a problem like adjacent tags distance might be same!

Or else you can use RegEx after that, definitely worth a try.

But do the first one and do share the results! :)

Weka POS tagging + tokenization

1 Answers1