I'm trying to apply word_tokenization to a Pandas DataFrame column as the step before POS tagging. The source/raw column is 'sent' (already sentence-tokenized) and the destination column is 'word'. Here's the code, including the max column width instruction:
pd.set_option('display.max_colwidth', None)
LC_HD_df['word'] = LC_HD_df['sent'].apply (lambda x: nltk.tokenize.word_tokenize(str(x)))
This appears to work... except... Each cell in 'word' only has the first 101 tokens from the 'sent' cell. Why is it truncating at 101 tokens? How do I fix this?
The 101 words end with "..." does that suggest that they have been tokenized but do not appear for some reason? (That doesn't make sense.)
Attached is a picture of the first row.
One row, two columns, one with the source words, one with the 101 word tokens
I searched for related questions to no avail. Many questions related generally, but did not find one addressing the truncation problem. This should be an easy fix that I just don't know, but, once I know the solution, will never forget.
Thanks in advance for your assistance.