unable to tokenise whole column

Question

I want to tokenize data from CSV file. I'm using this code and I'm unable to tokenize the entire column. I am only able to tokenize the first row in the column. The column is known as 'tweet'.

import pandas as pd
import nltk
from nltk import word_tokenize

data=pd.read_csv('/Users/yoshithKotla/Desktop/dingdang/nov19Tweets.csv')

Texts=list(data['tweet'].values)

tokenData = [nltk.word_tokenize(tweet) for tweet in Texts]

print(tokenData)

Could you add a sample dataset? Or best of all, could you add the link to the dataset, if there is one? — flaxel, Apr 12 '21 at 12:40

chikabala · Accepted Answer · 2021-04-12T12:58:44.553

0

Try this code and see what u get:

import csv
from nltk import word_tokenize 
with open('/Users/yoshithKotla/Desktop/dingdang/nov19Tweets.csv', 'r') as csvfile:
   reader = csv.DictReader(csvfile)
   for row in reader:
       tweet = row["tweet"]
       print("Tweet: %s" % tweet)
       tokens = word_tokenize(tweet)
       print(tokens)

To save the output as csv file you can use csv.writer:

writer = csv.writer(open("path_to_output", 'w'))
for row in tokens:
    if counter[row[0]] >= 4:
        writer.writerow(row)

edited Apr 12 '21 at 12:58

answered Apr 12 '21 at 12:42

chikabala

653
6
24

also, how can I save the output in a CSV file – Yoshith Kotla Apr 12 '21 at 12:51
please consider accepting my answer if it worked for you. – chikabala Apr 12 '21 at 12:59

unable to tokenise whole column

1 Answers1