I have a basic dataset with one object named 'comment', one float named 'toxicity'. My dataset's shape is (1999516, 2)
I'm trying to add a new column named 'tokenized' with nltk's word tokenized method and create bag of words like this :
dataset = pd.read_csv('toxic_comment_classification_dataset.csv')
dataset['tokenized'] = dataset['comment'].apply(nltk.word_tokenize)
as mentioned in "IN [22]"
I don't an get error message until i wait like 5 minutes after that i get this error :
TypeError: expected string or bytes-like object
How can I add tokenized comments in my vector (dataframe) as a new column?