3

I wrote some code in order to practice machine learning. But I had this problem, which I don't understand because I typed exactly the columns of the quandl table.

Here is my code:

import pandas as pd
import math 
import quandl 
import numpy as np 
from sklearn import preprocessing, svm, model_selection #preproceesing is used to do some cleaning or scalin of data prior to machine learning 
from sklearn.model_selection import train_test_split, cross_validate
from sklearn.linear_model import LinearRegression 

df=quandl.get("EOD/NKE", authtoken="jcfsm6-47Pe1hgxDqjDU")

df=df[['ADJ_OPEN','ADJ_HIGH','ADJ_LOW','ADJ_CLOSE','ADJ_VOLUME']]
df['HL_PCT']=(df['ADJ_HIGH'] -df['ADJ_LOW'])/ df['ADJ_CLOSE']*100.0
df['PCT_Change']=(df['ADJ_CLOSE']-df['ADJ_OPEN'])/df['ADJ_OPEN']*100.0
df=df[['ADJ_CLOSE','HL_PCT','PCT_Change','ADJ_VOLUME']]

print(df.head())

forecast_col='ADJ_CLOSE'
df.fillna(value=-99999, inplace=True)
forecast_out=int(math.ceil(0.01*len(df)))
df['label']=df[forecast_col].shift(-forecast_out)
df.dropna(inplace=True) #NaN in short term is Not a Number 

#In typical standard in machine learning, X is used to name the features, and y is used to name the label. 

X=np.array(df.drop(['label'],1))
y=np.array(df['label'])

X=preprocessing.scale(X)
y=np.array(df['label'])

#When training, take around 75% of your data to train, adn 25% to let the module predict. 
X_train, y_train, X_test, y_test=train_test_split(X,y,test_size=0.2)

# Define the classifier
clf=svm.SVR(gamma='auto')

# Train the model 
clf.fit(X_train, y_train)

# Test the model
confidence=clf.score(X_test, y_test)

print(confidence)

And when I ran it with command python3 my.py this is the error:

KeyError: "None of [Index(['ADJ_OPEN', 'ADJ_HIGH', 'ADJ_LOW', 'ADJ_CLOSE', 'ADJ_VOLUME'], dtype='object')] are in the [columns]"
jww
  • 97,681
  • 90
  • 411
  • 885
Anh Do
  • 31
  • 1
  • 5
  • 2
    Two things: Please edit your title to be more descriptive, and please edit your question to include the full traceback of the error – G. Anderson Sep 17 '19 at 18:10
  • 5
    you can print `df.columns` just after reading data and see what columns are there. – vb_rises Sep 17 '19 at 18:10
  • Also see [KeyError: “None of \[\['', ''\]\] are in the \[columns\]” pandas python](https://stackoverflow.com/q/51976930/608639) – jww Sep 17 '19 at 18:53
  • Does this answer your question? [KeyError: "None of \[\['', ''\]\] are in the \[columns\]" pandas python](https://stackoverflow.com/questions/51976930/keyerror-none-of-are-in-the-columns-pandas-python) – cottontail Feb 02 '23 at 20:59

0 Answers0