first I want to say that I am a newbie in Machine learning, so I hope you can help me. I want to predict stock prices using machine learning, it works for me using this tutorial. To get more accurate than this, I thought it may help using more data.
Now I have more CSV files with data but from other stocks and from other companies, now my question is, can I use files from different stocks to get more accurate and how could I do it?
Here is one Stackoverflow Question which describes my problem, but the answer doesnt help me. It says normalizing the data after combining it. But I dont understand whats ment with combining the datasets, because when I just add the CSV files under each other, the model wouldnt know which data is for which Ticker. I hope you understand my Problem and you can help me.
Here is the code, which I think is relevant:
df = pd.read_csv(r'aapl_data.csv')
df = df[['close']]
# Create variable to predict 'x' days out the future
future_days = 25
# Create a column (target) shifted 'x' days up
df['Prediction'] = df['close'].shift(-future_days)
# Create feature data set, convert it to numpy array and remove last 'x' rows/days
X = np.array(df.drop(['Prediction'], 1))[:-future_days]
# Create target data set (y), convert it to numpy array get all of the target values except the last 'x' rows/days
y = np.array(df['Prediction'])[:-future_days]
# Split the data to 75% training and 25% testing
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
x_train = np.nan_to_num(x_train)
y_train = np.nan_to_num(y_train)
tree = DecisionTreeRegressor().fit(x_train, y_train)
PS: You would also help me, changing the title when needed or telling me what I have to google to find an answer, because I dont think that I am the first person with this idea, but I dont know how this "problem" is called.
Thank you for your help!