0

Here is my sample data:

import pandas as pd

avg_consumption = pd.DataFrame({
'Car.Year.Model':[2009, 2010, 2011, 2012],
'City.mpg':[17.9, 17, 16.9, 18.3],
'Highway.mpg':[24.3, 23.6, 23.6, 25.7]
})

I want to use Linear Regression to predict the average fuel consumptions for each fuel range type (city and highway) per car model year.

My desired output is my same DataFrame but where it has predicted the average fuel consumption for car model years up to 2025 using my existing data. I am not entirely sure how to go about this.

What I have tried:

I attempted to follow the answer to this question as the question seemed similar:

from sklearn.linear_model import LinearRegression

years = pd.DataFrame()
years['Car.Year.Model'] = range(2009, 2025)
# I include 2009-2012 to test the prediction values are still the same as the original

X = avg_consumption.filter(['Car.Year.Model'])
y = avg_consumption.drop('Car.Year.Model', axis=1)

model = LinearRegression()
model.fit(X, y)

X_predict = years
y_predict = model.predict(X_predict)

My result is the following:

enter image description here

If I assume that my first row has the predicted values for 2009, it is incorrect because the values in my original DataFrame for model year 2009 are different.

I want to make sure it is predicting the average fuel consumption correctly for each year up to 2025. I would also like my results presented in a DataFrame similar to my sample data.

Could someone point me in the right direction?

k3b
  • 344
  • 3
  • 15
  • "it is incorrect because the values in my original DataFrame for model year 2009 are different.": that is because your (input) values are the actual data, but this dataframe has predictions from a best-fit model. The output is not your data: it's basically a line through some scatter points. – 9769953 Apr 29 '21 at 05:17

1 Answers1

0

You can use numpy.polyfit and numpy.poly1d for the linear extrapolation. And then add the projected years like so:

import pandas as pd
import numpy as np

avg_consumption = pd.DataFrame({
'Car.Year.Model':[2009, 2010, 2011, 2012],
'City.mpg':[17.9, 17, 16.9, 18.3],
'Highway.mpg':[24.3, 23.6, 23.6, 25.7]
})

f_city = np.poly1d(np.polyfit(avg_consumption["Car.Year.Model"], avg_consumption["City.mpg"], 1))
f_highway = np.poly1d(np.polyfit(avg_consumption["Car.Year.Model"], avg_consumption["Highway.mpg"], 1))
new_data = pd.DataFrame([[i, f_city(i), f_highway(i)] for i in range(2013, 2026)], columns=avg_consumption.columns)
avg_consumption = pd.concat([avg_consumption, new_data], axis=0)

Yields:

    Car.Year.Model  City.mpg  Highway.mpg
0             2009     17.90        24.30
1             2010     17.00        23.60
2             2011     16.90        23.60
3             2012     18.30        25.70
0             2013     17.80        25.35
1             2014     17.91        25.77
2             2015     18.02        26.19
3             2016     18.13        26.61
4             2017     18.24        27.03
5             2018     18.35        27.45
6             2019     18.46        27.87
7             2020     18.57        28.29
8             2021     18.68        28.71
9             2022     18.79        29.13
10            2023     18.90        29.55
11            2024     19.01        29.97
12            2025     19.12        30.39
ax = avg_consumption.set_index("Car.Year.Model").iloc[:4].plot()
avg_consumption.set_index("Car.Year.Model").iloc[3:].plot(ls="-.", ax=ax)

enter image description here

cosmic_inquiry
  • 2,557
  • 11
  • 23