I have a csv file containing IMDB movie ratings data. The file has 27 features and 1 target variable. I have attached SampleData. And also the data set can be downloaded from KaggleData. I have learnt that sklearn package of python requires all the data to be in numbers. So how do I use this data to do a regression analysis? Right now I have used below code, but it says "Some director name" can't be converted to float.
import pandas as pd
from sklearn.linear_model import LinearRegression
df = pd.read_csv('D:\Machine Learning\Final\movie_metadata.csv')
feature_cols = [
"director_facebook_likes",
"cast_total_facebook_likes",
"movie_facebook_likes",
"facenumber_in_poster",
"gross",
"num_critic_for_reviews",
"num_voted_users",
"num_user_for_reviews",
"duration",
"title_year",
"content_rating",
"budget",
"director_name"]
X = df[feature_cols]
y = df.imdb_score
lm = LinearRegression()
lm.fit(X, y)
print (lm.intercept_)
print (lm.coef_)