0

I'm new to Weka. I am trying to sentimental classify movie reviews. The thing is, I can understand the StringToWord Vector which tokenizes and attributes the word occurrences. I want to add the Parts Of Speech tags also to the attribute vocabulary but I am getting stuck to how?

Has anyone tried this before?

Please, can you guide me?

P.S . I am using OpenNLP for POS tagging and Weka J48 classifier !!

Adnan
  • 2,931
  • 3
  • 23
  • 35
Harish Gontu
  • 13
  • 1
  • 7
  • have you uploaded a text file and then tokenize it in Weka? – Muhammad Yaseen Khan Jul 01 '16 at 11:38
  • Yup, I did . I used TextDirectoryLoader class for accessing my data in instances format and StringToWordVector or tokenization . Now , I cannot understand how to add POS tags for each tokenized attribute ? I also tried counting word occurences by my own and created an ARFF file on my own but it gave me error IOException premature end of line ... – Harish Gontu Jul 01 '16 at 12:11

1 Answers1

0

Trial and error approach:

Do something like write the POStagged data into a text file and then do the word2vec. Then check the distance between a word and a POStag, nearest one is it's POS?

Then there would be a problem like adjacent tags distance might be same!

Or else you can use RegEx after that, definitely worth a try.

But do the first one and do share the results! :)

iamgr007
  • 966
  • 1
  • 8
  • 28