0

Im trying to to predict a variable length input/output many to many sequence using Keras, the dataframe below is a representation of the data . 5 columns and one target column.

    df3={'email': [[0,0,0,1],[0,1,2],[0,3,1,5],[0,0,0,1],[0,1,2],[0,3,1,5]],
         'fax':[[0,1,0,1],[3,2],[0,2,1,5,4,6],[0,1,0,1],[3,2],[0,2,1,5,4,6]],
         'physical_mail':[[0,0,0,2],[0,2],[0,9,1,3,4,0],[0,0,3,0],[1,2],[0,2,0,2,4,6]],
         'cold_call':[[0,0,0,0,0,0],[0,2,0,0],[0,1,1,3,2,0,2,2,],[0,0,3,0,0,0,0],[1,2,5,0,0,1,2],[0,2,0,2,4,3,9,0,6]],
         'in_person':[[0,0,0,0,0,0],[0,0,0,0],[0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,1],[1,0,0,0,0,0,0],[0,2,0,2,0,0,9,0,0,0,0,1]],
          'tar':[[0,1],[0,0,0,0],[0,0,0,0,1],[0,1],[0,0,0,0],[0,0,0,0,1]]
         }
    df4=pd.Dataframe(df3)

To reshape the data there are six sample , 5 columns which are fed one column at a time and y is 6 samples , 1 column one at a time

    x_train=df4[['email','fax','physical_mail','cold_call','in_person']].values.reshape(6,5,1)
    y_train=df4.tar.values.reshape(6,1,1)


 
 model = Sequential()  
 ## 5 columns which are passed one at a time so the input shape (5,1)
 model.add(LSTM(64 , input_shape=(5,1))) 
 # kinda not sure about the RepeatVector argument 
 model.add(RepeatVector(10))
 model.add(LSTM(64,return_sequences=True))
 model.add(TimeDistributed(Dense(1)))
 model.add(Activation('linear'))   
 model.compile(loss='mean_squared_error', optimizer='rmsprop')

Im seeing an error " Setting an array element with sequence . Is it because the input is a mixture of lists ? If so how to flatten this ?

sr33kant
  • 35
  • 3

1 Answers1

0

Try this -

np.array([np.concatenate(pad_sequences(list(v), maxlen=12)) for k,v in df4[['email','fax','physical_mail','cold_call','in_person']].items()])
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 1, 5, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 3, 1, 5],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        3, 2, 0, 0, 0, 0, 0, 0, 0, 2, 1, 5, 4, 6, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 2, 0, 0, 0, 0, 0, 0,
        0, 2, 1, 5, 4, 6],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 2, 0, 0, 0, 0, 0, 0, 0, 9, 1, 3, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0,
        0, 2, 0, 2, 4, 6],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2,
        0, 0, 0, 0, 0, 0, 0, 1, 1, 3, 2, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 3,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 5, 0, 0, 1, 2, 0, 0, 0, 0, 2, 0,
        2, 4, 3, 9, 0, 6],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 2, 0, 0,
        9, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 1]]

This should give you a 1D array for each row, where each of the columns is padded to 12 length and concatenated. Assuming that this is what you need. If you need 2D array for each row then ignore the concatenate part.

np.array([pad_sequences(list(v), maxlen=12) for k,v in df4.items()])
array([[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 1, 5],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 1, 5]],

       [[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 2],
        [0, 0, 0, 0, 0, 0, 0, 2, 1, 5, 4, 6],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 2],
        [0, 0, 0, 0, 0, 0, 0, 2, 1, 5, 4, 6]],

       [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2],
        [0, 0, 0, 0, 0, 0, 0, 9, 1, 3, 4, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2],
        [0, 0, 0, 0, 0, 0, 0, 2, 0, 2, 4, 6]],

       [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0],
        [0, 0, 0, 0, 0, 1, 1, 3, 2, 0, 2, 2],
        [0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 1, 2, 5, 0, 0, 1, 2],
        [0, 0, 0, 0, 2, 0, 2, 4, 3, 9, 0, 6]],

       [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
        [0, 2, 0, 2, 0, 0, 9, 0, 0, 0, 0, 1]],

       [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]]], dtype=int32)
Akshay Sehgal
  • 18,741
  • 3
  • 21
  • 51
  • THank you again Hmm , if I use a 2D array , then it would change my input to shape(5,6,12) , How does that change my input to lstm ? 5 columns , 1 column at a time but now I have 12 (length of each sequence) ? input_shape=(5,12) ? – sr33kant Jul 15 '20 at 03:54
  • depends on what you are trying to encode right? are u trying to encode the complete sequence of all parameters concatenated together? or you are trying to encode each of the 5 input sequences separately. – Akshay Sehgal Jul 15 '20 at 04:34