I am trying to load a CSV file into python and clean the text. but I keep getting an error. I saved the CSV file in a variable called data_file and the function below cleans the text and supposed to return the clean data_file.
import pandas as pd
import numpy as np
import re
import matplotlib.pyplot as plt
df = pd.read_csv("/Users/yoshithKotla/Desktop/janTweet.csv")
data_file = df
print(data_file)
def cleanTxt(text):
text = re.sub(r'@[A-Za-z0-9]+ ', '', text) # removes @ mentions
text = re.sub(r'#[A-Za-z0-9]+', '', text)
text = re.sub(r'RT[\s]+', '', text)
text = re.sub(r'https?:\/\/\S+', '', text)
return text
df['data_file'] = df['data_file'].apply(cleanTxt)
df
I get a key error here.