2

Trying to build a single output regression model, but there seems to be problem in the last layer

inputs = Input(shape=(48, 1))
lstm = CuDNNLSTM(256,return_sequences=True)(inputs)
lstm = Dropout(dropouts[0])(lstm)

#aux_input
auxiliary_inputs = Input(shape=(48, 7))
auxiliary_outputs = TimeDistributed(Dense(4))(auxiliary_inputs)
auxiliary_outputs = TimeDistributed(Dense(7))(auxiliary_outputs)

#concatenate
output = keras.layers.concatenate([lstm, auxiliary_outputs])

output = TimeDistributed(Dense(64, activation='linear'))(output)
output = TimeDistributed(Dense(64, activation='linear'))(output)
output = TimeDistributed(Dense(1, activation='linear'))(output)

model = Model(inputs=[inputs, auxiliary_inputs], outputs=[output])

I am new to keras... I am getting the following error

ValueError: Error when checking target: expected time_distributed_5 to have 3 dimensions, but got array with shape (14724, 1)

mojo1643
  • 65
  • 7

2 Answers2

1

Okay guys, think I found a fix According to - https://keras.io/layers/wrappers/ it says that we are applying dense layer to each timestep (in my case I have 48 timesteps). So, the output of my final layer would be (batch_size, timesteps, dimensions) for below:

output = TimeDistributed(Dense(1, activation='linear'))(output)

will be (?,48,1) hence the dimensions mismatch. However, If I want to convert this to single regression output we will have to flatten the final TimeDistributed layer

so I added the following lines to fix it:

output = Flatten()(output)
output = (Dense(1, activation='linear'))(output)

so now the timedistributed layer flattens to 49 inputs(looks like a bias input is included) to the final dense layer into a single output.

Okay, the code works fine and I am getting proper results(the model learns). My only doubt is if it is mathematically okay to flatten TimeDistributed layer to simple dense layer to get my result like stated above?

mojo1643
  • 65
  • 7
0

Can you provide a more on the context of your problem? Test data or at least more code. Why are you choosing this architecture in the first place? Would a simpler architecture (just the LSTM) do the trick? What are you regressing? Stacking multiple TimeDistributed Dense layers with linear activation functions probably isn't adding much to the model.

June Skeeter
  • 1,142
  • 2
  • 13
  • 27
  • Yes, you are probably right. I could just use a LSTM but it dint give me satisfactory results. The input of lstm is 48 timestep sequence and I want to predict the next timestep (the 49th). Each time step comes with additional data (48, 7) (7 features)which is fed via the aux-input. I am trying to concat the output of lstm with the auxiliary-outputs. – mojo1643 Oct 17 '17 at 09:01