I have the following sample data frame shown below. It has been tokenized already.
No category problem_definition_stopwords
175 2521 ['coffee', 'maker', 'brewing', 'properly', '2', '420', '420', '420']
211 1438 ['galley', 'work', 'table', 'stuck']
912 2698 ['cloth', 'stuck']
572 2521 ['stuck', 'coffee']
I want to do part of speech tagging on this data frame. Below is the beginning of my code. It is erroring out:
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer
train_text = state_union.raw(df['problem_definition_stopwords'])
Error
TypeError: join() argument must be str or bytes, not 'list'
My desired result is below where 'XXX' is a tokenized word and after it is the part of speech (i.e. NNP):
[('XXX', 'NNP'), ('XXX', 'VBD'), ('XXX', 'POS')]