0

I tried to go through all cells from a CSV, from the column 'Text', and to make a new column named 'Type' where I'll have the type of text generated by predictions using Multinomial Naive Bayes.

This is the code:

from sklearn.naive_bayes import MultinomialNB
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
dataset = pd.read_csv("Test.csv", encoding='latin-1')

clf = MultinomialNB()
cv = CountVectorizer()


for row in dataset:
    text= row['Text']
    data = cv.transform([text]).toarray()
    output = clf.predict(data)
    dataset['Type']=dataset[output]

This is my error:

text= row['Text']
TypeError: string indices must be integers
desertnaut
  • 57,590
  • 26
  • 140
  • 166
Heleneh
  • 3
  • 4
  • We don't know what these variables are or what the error is. Try making a small running program that demonstrates the problem (including a sample data set) and post the traceback message. – tdelaney Sep 21 '21 at 05:03
  • Hello, @tdelaney, I edited my question, hope I'm more explicit now. – Heleneh Sep 21 '21 at 05:16
  • Check this out: https://stackoverflow.com/questions/6077675/why-am-i-seeing-typeerror-string-indices-must-be-integers – Vaidik Sep 21 '21 at 05:19
  • Please [don’t post images of code, error messages, or other textual data.](https://meta.stackoverflow.com/questions/303812/discourage-screenshots-of-code-and-or-errors) – tripleee Sep 21 '21 at 05:46

1 Answers1

0

The method used to iterate through the rows of the data frame is incorrect. So here

for row in dataset:

Only returns the 1st row , which usually contains all the column names which are normally strings. So when we do: text= row['Text'] It tries to extract the string at the index 'Text' and string indices can only be integers, hence the error.

eg: text= "abc"
>print(text[0]) #Output is 'a'. 
>print(text['abc']) #Error - string indices must be integers

So the correct way to iterate through rows and extract the required column's value would be:

for index,row in df.iterrows():
    text= row["Text"]

For information about the iterrows function , refer here : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html

Nilaya
  • 148
  • 8