0

I really need your help! I've written this code:

from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import accuracy_score

def train_test_rmse(x,y):
    X = df_new[feature_cols]
    y = df_new['TOTAL CONSTRUCTION COST - EXCLUDING TAX']
    X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.2,random_state=123)
    linreg = LinearRegression()
    linreg.fit(X_train, y_train)
    y_test = train_test_split(x, y, test_size = 0.2,random_state=123)
    y_pred = linreg.predict(X_test)
    print(accuracy_score(y_test, y_pred))  
    return np.sqrt(metrics.mean_squared_error(y_test, y_pred)) 

^ The code above runs correctly. But when I try to plot a scatter plot in the cell beneath:

import matplotlib.pyplot as plt
plt.scatter(y_test, y_pred)
plt.xlabel('Actual Y')
plt.ylabel('Predicted Y')
plt.show()

I get the error "name 'y_test' is not defined". Please let me know how to fix it. Thanks.

Anand Vidvat
  • 977
  • 7
  • 20
eopiyo
  • 1
  • 1

1 Answers1

0

In the code, i see that y_test is defined inside the train_test_rmse function, you need to initialize y_test outside this function.

your code should work fine with few changes as follows :

y_test = None

def train_test_rmse(x,y):
  
  global y_test
  
    X = df_new[feature_cols]
    y = df_new['TOTAL CONSTRUCTION COST - EXCLUDING TAX']
    X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.2,random_state=123)
Anand Vidvat
  • 977
  • 7
  • 20
  • Hi Anand, thank you very much for your response. But unfortunately, now I've added in this code nothing is showing up on the scatter plot. It is blank. How can I fix this? – eopiyo Nov 27 '20 at 23:10
  • Could you share the script itself, I imagine, this could be due to execution order. Did you try clearing the whole notebook and run all cells? – Anand Vidvat Nov 28 '20 at 05:33
  • Hi Anand, I don't think I'll be able to share the entire script sadly as it contains confidential data. But I did clear the entire notebook and ran all cells and it came up with the error "name 'y_pred' is not defined" instead. So I decided to create an entire new notebook altogether that only included 4 cells (1. importing the libraries. 2. creating the dataframe 3&4. the code in my original post with your corrections) and it still showed an empty scatter plot. So I really don't know what could be wrong – eopiyo Nov 28 '20 at 12:47
  • can you check the values of y_test and y_pred before running the scatter plot ? i suspect either one of them is causing this issue. – Anand Vidvat Nov 28 '20 at 13:35