16
import pandas as pd
import numpy as np

rands = np.random.random(7)
days = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
dates = pd.date_range('2018-01-01', '2018-01-07')

df = pd.DataFrame({'date': dates, 'days': days, 'y': rands})

df_days_onehot = pd.get_dummies(df.days)[days]
df[days] = df_days_onehot
df['target'] = df.y.shift(-1)

df.drop('days', axis=1, inplace=True)
df.set_index('date', inplace=True)

X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values

I shared a code example above. My question is how should I combine the numerical and the categorical variables as inputs for LSTM ?

How should the input vector be like ?

  1. Should it be like [0.123, 0, 1, 0, 0 ...] (like X in the code) dim = (1,8)?
  2. Should it be like [0.123, [0, 1, 0, 0...]] dim(1,2)
  3. Or is there a specific way/ways to pass inputs to ANNs or RNNs etc. If so, what is it, and why we should use it/them (pros/cons)?

I read things about embedding but the explanations seems not enough for me since I wanted to learn the logic behind all of these.

Something like this...

model = Sequential()
model.add(LSTM(64, batch_input_shape=(batch_size, look_back, 1), stateful=True, return_sequences=True))
model.add(Dropout(0.3))
model.add(LSTM(32, batch_input_shape=(batch_size, look_back, 1), stateful=True))
model.add(Dropout(0.3))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer=adam)
model.fit(trainX, trainY, epochs=100, batch_size=batch_size, verbose=2, shuffle=False)

Any guidence, link, explanation or help will be appriciated... Have a nice day.

TheDarkKnight
  • 401
  • 1
  • 4
  • 9

2 Answers2

14

There are variety of preprocessing that can be looked at while dealing with input of various ranges in general (like normalization etc). One hot representation is certainly a good way to represent categories.

Embeddings are used when there too many category elements which makes one hot encoding very large. They provide a vector representation (potentially trainable ) that encodes a given input. You can read more about them in the link below. Use of Embeddings are very common in NLP.

https://towardsdatascience.com/deep-learning-4-embedding-layers-f9a02d55ac12

That aside, you could however take advantage of the fact that Keras modelling supports multiple input layers.

For your specific case, here is a made up example that might help you get started. Again, I added few dense hidden layers just to demonstrate the point. It should be self explanatory

X1 = rands  
X2 = df_days_onehot
Y = np.random.random(7)

float_input = Input(shape=(1, ))
one_hot_input = Input(shape=(7,) )

first_dense = Dense(3)(float_input)
second_dense = Dense(50)(one_hot_input)

merge_one = concatenate([first_dense, second_dense])
dense_inner = Dense(10)(merge_one)
dense_output = Dense(1)(dense_inner)


model = Model(inputs=[float_input, one_hot_input], outputs=dense_output)


model.compile(loss='mean_squared_error',
              optimizer='adam',
              metrics=['accuracy'])

model.summary()

model.fit([X1,X2], Y, epochs=2)
user007
  • 540
  • 5
  • 11
  • 1
    First of all, thanks for answering. It was really helpful. But, I will be using Sequential model, and I need to add an LSTM layer to it. Can you simply convert your example code to Sequential model with its layers? I will edit my above question with sth what I will be expecting to do. – TheDarkKnight Jul 16 '18 at 14:46
  • Sequential model wont support multiple inputs or sharing layers. You could still use LSTM layers to Functional model in the same lines as in the example. If you cannot avoid sequential model, you need to combine all inputs into one input layer – user007 Jul 16 '18 at 15:07
  • What do you mean for combining all inputs? is creating an input vector like [0.123, 0, 1, 0, 0, 0,...] does the job ? I mean is that what you have meant as combining them? – TheDarkKnight Jul 16 '18 at 15:12
  • Yes. Depending on the problem, it is certainly possible that it might do the job. – user007 Jul 16 '18 at 15:28
  • 1
    At the end, I was just wondering why we can't use them like [real, categorical] like we use [real, real, real, ...] as input vectors. What is the downside. If you have a answer for this I will be appriciated. And Thank you for your time. I will try some possible cases. – TheDarkKnight Jul 16 '18 at 15:35
3

Another way (probably more elegant) is to condition on the categorical variables (whose value do not change over time).

Let's take an example with weather data from two different cities: Paris and San Francisco. You want to predict the next temperature based on historical data. But at the same time, you expect the weather to change based on the city. You can either:

  • Combine the auxiliary features with the time series data (what you suggested here).

  • Concatenate the auxiliary features with the output of the RNN layer. It's some kind of post-RNN adjustment since the RNN layer won't see this auxiliary info.

  • Or just initialize the RNN states with a learned representation of the condition (e.g. Paris or San Francisco).

I wrote a library to condition on auxiliary inputs. It abstracts all the complexity and has been designed to be as user-friendly as possible:

https://github.com/philipperemy/cond_rnn/

Hope it helps!

Philippe Remy
  • 2,967
  • 4
  • 25
  • 39