import pandas as pd
import numpy as np
rands = np.random.random(7)
days = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
dates = pd.date_range('2018-01-01', '2018-01-07')
df = pd.DataFrame({'date': dates, 'days': days, 'y': rands})
df_days_onehot = pd.get_dummies(df.days)[days]
df[days] = df_days_onehot
df['target'] = df.y.shift(-1)
df.drop('days', axis=1, inplace=True)
df.set_index('date', inplace=True)
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values
I shared a code example above. My question is how should I combine the numerical and the categorical variables as inputs for LSTM ?
How should the input vector be like ?
- Should it be like [0.123, 0, 1, 0, 0 ...] (like X in the code) dim = (1,8)?
- Should it be like [0.123, [0, 1, 0, 0...]] dim(1,2)
- Or is there a specific way/ways to pass inputs to ANNs or RNNs etc. If so, what is it, and why we should use it/them (pros/cons)?
I read things about embedding but the explanations seems not enough for me since I wanted to learn the logic behind all of these.
Something like this...
model = Sequential()
model.add(LSTM(64, batch_input_shape=(batch_size, look_back, 1), stateful=True, return_sequences=True))
model.add(Dropout(0.3))
model.add(LSTM(32, batch_input_shape=(batch_size, look_back, 1), stateful=True))
model.add(Dropout(0.3))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer=adam)
model.fit(trainX, trainY, epochs=100, batch_size=batch_size, verbose=2, shuffle=False)
Any guidence, link, explanation or help will be appriciated... Have a nice day.