Here is my sample data:
import pandas as pd
avg_consumption = pd.DataFrame({
'Car.Year.Model':[2009, 2010, 2011, 2012],
'City.mpg':[17.9, 17, 16.9, 18.3],
'Highway.mpg':[24.3, 23.6, 23.6, 25.7]
})
I want to use Linear Regression to predict the average fuel consumptions for each fuel range type (city and highway) per car model year.
My desired output is my same DataFrame but where it has predicted the average fuel consumption for car model years up to 2025 using my existing data. I am not entirely sure how to go about this.
What I have tried:
I attempted to follow the answer to this question as the question seemed similar:
from sklearn.linear_model import LinearRegression
years = pd.DataFrame()
years['Car.Year.Model'] = range(2009, 2025)
# I include 2009-2012 to test the prediction values are still the same as the original
X = avg_consumption.filter(['Car.Year.Model'])
y = avg_consumption.drop('Car.Year.Model', axis=1)
model = LinearRegression()
model.fit(X, y)
X_predict = years
y_predict = model.predict(X_predict)
My result is the following:
If I assume that my first row has the predicted values for 2009, it is incorrect because the values in my original DataFrame for model year 2009 are different.
I want to make sure it is predicting the average fuel consumption correctly for each year up to 2025. I would also like my results presented in a DataFrame similar to my sample data.
Could someone point me in the right direction?