-1

I'm trying to do this polynomial regression using the scatter plot, and I have two concerns:

  1. The red line, which is the polynomial regression appears wrong to me when compared with the plots by the data values

  2. How can I calculate the r-square for each regression

A part of the X and Y data used (I took this data from the excel file):

The Y goes for each column that represents a specific region with total values.

x=[1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980...]

y=[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.164, 0.16499999999999998, 0.16999999999999998, 0.175, 0.17200000000000001, 0.185, 0.189, 0.195, 0.201...]

#read the data
Renew = pd.read_excel('bp-stats-review-2019-all-data.xlsx', sheet_name = 'Renewables - TWh', headers = 2, skiprows=2, usecols = range(55)).dropna(axis=0,how='all').iloc[:-10]
Renew.fillna('0',inplace=True)

#Taking only the Totals
Countries_Renew = Renew[~Renew['Terawatt-hours'].str.startswith('Total')].sort_values(['Terawatt-hours'])
Countries_Renew.set_index('Terawatt-hours', inplace=True)

#build the Linear plot regression by region
df=Countries_Renew_Total.drop(['Total World']).transpose()
n=0

for j in df.columns:
    print('The region is: '+j)
    print(n)
    for i in range(1,3):
        #import the dataset
        x=df.index.values.reshape(-1,1)
        y=df.iloc[:,int(n)].values.reshape(-1,1)

        #Fit the linear regression
        lin=LinearRegression()
        lin.fit(x,y)

        #Fit the Poly regression
        poly = PolynomialFeatures(degree = i)
        x_poly = poly.fit_transform(x)
        poly.fit(x_poly,y)
        lin2=LinearRegression()
        lin2.fit(x_poly,y)

        #Plot Poly regression
        plt.scatter(x,y,color='blue')
        plt.plot(x,lin2.predict(poly.fit_transform(x)),color='red')
        plt.title('Polynomial Regression degree '+str(i))
        plt.xlabel('Year')
        plt.ylabel('Renewable Generation (TWh)')
        plt.show()
        print(lin2.predict(poly.fit_transform([[2019]])))
        print(lin2.predict(poly.fit_transform([[2020]])))
    n=n+1

enter image description here

enter image description here

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Tayzer Damasceno
  • 302
  • 1
  • 4
  • 12

1 Answers1

0

The first graph you posted actually looks about how I would expect. The majority of the points are nearly horizontal, with a few of the rightmost points extending upwards. You have a near flat line of best fit applied which is attempting to minimize the error (which is the distance between your predictions and the actual values). Does this make sense?

It should be noted, that in order to do a linear regression on exponential data, you need to apply a log to the exponential data, which will turn it into a linear data set. Does that make sense?

Your second example is a little more confusing as I'm not familiar with the Polynomial features function, but I agree the curve does not look very accurate.

Sean Payne
  • 1,625
  • 1
  • 8
  • 20
  • Totally agree with you, Sean! A tried to increase the degree, worked for some data, but I'll try with log to take a look if the result is better. Thank you so much. – Tayzer Damasceno May 09 '20 at 13:58