6

When I run my code I get the error:

NameError: name 'df_test' is not defined

I don't get this error on my other computer, but on my new one I do. I think it has to do with global and local variables, but that is strange since variables created in the second cell are actually used in the third, the problem occurs in the fourth cell.

I have tried stating global, and then the variables in the first cell, this does not work. Doing this in the third cell, does actually work. But I don't want to keep doing this, because I know from my other computer that this is not normal.

### cell 1
    import pandas as pd
    import numpy as np
    from sklearn.model_selection import train_test_split,cross_val_score,ShuffleSplit  
    import os
    import scipy

### cell 2
    df=pd.read_csv("pandas2.txt",sep=';').drop('listened',axis=1).drop('Usercount',1)
    temp_u=df['User'].unique()
    temp_s=df['Song'].unique()
    avg=df['rating'].mean()
### cell 3
    lamda=0.05
    gamma=0.04
    m=128
    splits=20
    df_train,df_test=train_test_split(df,test_size=0.1, random_state=1)
    beta_u=pd.DataFrame(temp_u,columns=['User'])
    beta_s=pd.DataFrame(temp_s,columns=['Song'])
    beta_u['beta_u']=0
    beta_s['beta_s']=0
    for chunk in np.array_split(df_train, splits):
        x=chunk.merge(beta_u, on='User',how='left').merge(beta_s,on='Song',how='left')

        x['pred']=avg+x['beta_u']+x['beta_s']+(x[pnames]*x[qnames]).sum(axis=1)
        x['gradu']=gamma*(x['rating']-x['pred']-lamda*x['beta_u'])
        beta_u=beta_u.merge(x[['User','gradu']].groupby('User').mean(),on='User',how="left").groupby('User').mean().fillna(0)
        beta_u['beta_u']+=beta_u['gradu']
        beta_u=beta_u.drop(['gradu'],axis=1)
        x['grads']=gamma*(x['rating']-x['pred']-lamda*x['beta_s'])
        beta_s=beta_s.merge(x[['Song','grads']].groupby('Song').mean(),on='Song',how="left").fillna(0)
        beta_s['beta_s']+=beta_s['grads']
        beta_s=beta_s.drop(['grads'],axis=1)


        x[pgrad]=(x[qnames].multiply(x['rating']-x['pred'], axis="index")+np.array(x[qnames]**2)*np.array(x[pnames]))#.divide((x[qnames]*x[qnames]).sum(axis=1),axis=0)
        beta_u=beta_u.merge(x[['User']+pgrad].groupby('User').mean(),on='User',how="left").fillna(0)
        beta_u[pnames]=beta_u[pgrad]#np.array(beta_u[pnames])+np.array(beta_u[pgrad])
        beta_u[pnames]=np.where(beta_u[pnames]>0,beta_u[pnames],10**(-6))
        beta_u=beta_u.drop(pgrad,1)


        x[qgrad]=(x[pnames].multiply(x['rating']-x['pred'], axis="index")+np.array(x[pnames]**2)*np.array(x[qnames]))#.divide((x[pnames]*x[pnames]).sum(axis=1),axis=0)
        beta_s=beta_s.merge(x[['Song']+qgrad].groupby('Song').mean(),on='Song',how="left").fillna(0)
        beta_s[qnames]=beta_s[qgrad]#np.array(beta_s[qnames])+np.array(beta_s[qgrad])
        beta_s[qnames]=np.where(beta_s[qnames]>0,beta_s[qnames],10**(-6))
        beta_s=beta_s.drop(qgrad,1)
    x=df_test.merge(beta_u, on='User',how='left').merge(beta_s,on='Song',how='left').fillna(0)
    x['pred']=x['beta_u']+x['beta_s']+avg+(np.array(x[pnames])*np.array(x[qnames])).sum(axis=1)
    x['pred2']=np.where(x['pred']>0.5,1,0)
    RMSE=np.mean((x['rating']-x['pred'])**2)
    RMSE2=np.mean((x['rating']-x['pred2'])**2)
    print(RMSE)
    print(RMSE2)
### cell 4
    t=len(df_test)
    sim_Song=pd.DataFrame(scipy.sparse.load_npz('simUser.npz').todense())
    sim_Song.index=pd.read_csv('Itemnames.csv',sep=';')['Song']
    sim_Song.columns=pd.read_csv('Itemnames.csv',sep=';')['Song']
    beta_s=beta_s.set_index('Song')

NameError: name 'df_test' is not defined

And when global df_train, df_test, df, x, beta_s, beta_u is put on the top of cell 3, it works fine

double-beep
  • 5,031
  • 17
  • 33
  • 41

1 Answers1

2

The problem somehow is %%time. If I delete this suddenly everything works perfectly fine.

double-beep
  • 5,031
  • 17
  • 33
  • 41
  • 1
    read more about this error [here](https://stackoverflow.com/questions/32565829/simple-way-to-measure-cell-execution-time-in-ipython-notebook) – vb_rises Jun 17 '19 at 13:25