For a side project of mine, I am trying to build a Naives Bayes model that can detect if a piece of news is fake based on the headline. Here is my code so far:
import numpy as np
import pandas as pd
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import CountVectorizer
data = pd.read_csv("/Users/amanpuranik/Desktop/fake-news-detection/data.csv")
data = data[['Headline', "Label"]]
print(data)
x = data[["Headline"]]
y = data[["Label"]]
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.1, random_state=1)
tfidf_vectorizer=TfidfVectorizer(stop_words='english', max_df=0.7)
model = MultinomialNB()
model.fit(x_train, y_train)
When I run this, I get an error that tells me the headline cannot be converted to a float value. Since the headline is made up of a bunch of words, I was wondering what my next steps could be as im not sure how a word could be converted to a float.