11

I have a dataset with 2 columns - Each column contains a set of documents. I have to match the document in Col A with documents provided in Col B. This is a supervised classification problem. So my training data contains a label column indicating whether the documents match or not.

To solve the problem, I have a created a set of features, say f1-f25 (by comparing the 2 documents) and then trained a binary classifier on these features. This approach works reasonably well, but now I would like to evaluate Deep Learning models on this problem (specifically, LSTM models).

I am using the keras library in Python. After going through the keras documentation and other tutorials available online, I have managed to do the following:

from keras.layers import Input, Embedding, LSTM, Dense
from keras.models import Model

# Each document contains a series of 200 words 
# The necessary text pre-processing steps have been completed to transform  
  each doc to a fixed length seq
main_input1 = Input(shape=(200,), dtype='int32', name='main_input1')
main_input2 = Input(shape=(200,), dtype='int32', name='main_input2')

# Next I add a word embedding layer (embed_matrix is separately created    
for each word in my vocabulary by reading from a pre-trained embedding model)
x = Embedding(output_dim=300, input_dim=20000, 
input_length=200, weights = [embed_matrix])(main_input1)
y = Embedding(output_dim=300, input_dim=20000, 
input_length=200, weights = [embed_matrix])(main_input2)

# Next separately pass each layer thru a lstm layer to transform seq of   
vectors into a single sequence
lstm_out_x1 = LSTM(32)(x)
lstm_out_x2 = LSTM(32)(y)

# concatenate the 2 layers and stack a dense layer on top
x = keras.layers.concatenate([lstm_out_x1, lstm_out_x2])
x = Dense(64, activation='relu')(x)
# generate intermediate output
auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(x)

# add auxiliary input - auxiliary inputs contains 25 features for each document pair
auxiliary_input = Input(shape=(25,), name='aux_input')

# merge aux output with aux input and stack dense layer on top
main_input = keras.layers.concatenate([auxiliary_output, auxiliary_input])
x = Dense(64, activation='relu')(main_input)
x = Dense(64, activation='relu')(x)

# finally add the main output layer
main_output = Dense(1, activation='sigmoid', name='main_output')(x)

model = Model(inputs=[main_input1, main_input2, auxiliary_input], outputs= main_output)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit([x1, x2,aux_input], y,
      epochs=3, batch_size=32)

However, when I score this on the training data, I get the same prob. score for all cases. The issue seems to be with the way auxiliary input is fed in (because it generates meaningful output when I remove the aux. input). I also tried inserting the auxiliary input at different places in the network. But somehow I couldnt get this to work.

Any pointers?

Dataminer
  • 1,499
  • 3
  • 16
  • 21
  • Not sure if that is intended, but auxiliary_output is only (1,). Is it really what you expect? Merge 25 auxiliary inputs with only one result? -- Is the model before auxiliary output intended to be "not trainable" while you train only the final part? – Daniel Möller May 12 '17 at 18:58
  • Well yes.This is a binary classification model so the final output is (1,). Should the auxiliary output be different? I am simply feeding in the additonal set of 25 features as the auxiliary input and hence the (25,) shape – Dataminer May 13 '17 at 05:12
  • Have you tried more epochs? – Marcin Możejko Oct 29 '17 at 19:42

2 Answers2

2

Well, this is open for several months and people are voting it up.
I did something very similar recently using this dataset that can be used to forecast credit card defaults and it contains categorical data of customers (gender, education level, marriage status etc.) as well as payment history as time series. So I had to merge time series with non-series data. My solution was very similar to yours by combining LSTM with a dense, I try to adopt the approach to your problem. What worked for me is dense layer(s) on the auxiliary input.

Furthermore in your case a shared layer would make sense so the same weights are used to "read" both documents. My proposal for testing on your data:

from keras.layers import Input, Embedding, LSTM, Dense
from keras.models import Model

# Each document contains a series of 200 words 
# The necessary text pre-processing steps have been completed to transform  
  each doc to a fixed length seq
main_input1 = Input(shape=(200,), dtype='int32', name='main_input1')
main_input2 = Input(shape=(200,), dtype='int32', name='main_input2')

# Next I add a word embedding layer (embed_matrix is separately created    
for each word in my vocabulary by reading from a pre-trained embedding model)
x1 = Embedding(output_dim=300, input_dim=20000, 
input_length=200, weights = [embed_matrix])(main_input1)
x2 = Embedding(output_dim=300, input_dim=20000, 
input_length=200, weights = [embed_matrix])(main_input2)

# Next separately pass each layer thru a lstm layer to transform seq of   
vectors into a single sequence
# Comment Manngo: Here I changed to shared layer
# Also renamed y as input as it was confusing
# Now x and y are x1 and x2
lstm_reader = LSTM(32)
lstm_out_x1 = lstm_reader(x1)
lstm_out_x2 = lstm_reader(x2)

# concatenate the 2 layers and stack a dense layer on top
x = keras.layers.concatenate([lstm_out_x1, lstm_out_x2])
x = Dense(64, activation='relu')(x)
x = Dense(32, activation='relu')(x)
# generate intermediate output
# Comment Manngo: This is created as a dead-end
# It will not be used as an input of any layers below
auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(x)

# add auxiliary input - auxiliary inputs contains 25 features for each document pair
# Comment Manngo: Dense branch on the comparison features
auxiliary_input = Input(shape=(25,), name='aux_input')
auxiliary_input = Dense(64, activation='relu')(auxiliary_input)
auxiliary_input = Dense(32, activation='relu')(auxiliary_input)

# OLD: merge aux output with aux input and stack dense layer on top
# Comment Manngo: actually this is merging the aux output preparation dense with the aux input processing dense
main_input = keras.layers.concatenate([x, auxiliary_input])
main = Dense(64, activation='relu')(main_input)
main = Dense(64, activation='relu')(main)

# finally add the main output layer
main_output = Dense(1, activation='sigmoid', name='main_output')(main)

# Compile
# Comment Manngo: also define weighting of outputs, main as 1, auxiliary as 0.5
model.compile(optimizer=adam,
              loss={'main_output': 'w_binary_crossentropy', 'aux_output': 'binary_crossentropy'},
              loss_weights={'main_output': 1.,'auxiliary_output': 0.5},
              metrics=['accuracy'])

# Train model on main_output and on auxiliary_output as a support
# Comment Manngo: Unknown information marked with placeholders ____
# We have 3 inputs: x1 and x2: the 2 strings
# aux_in: the 25 features
# We have 2 outputs: main and auxiliary; both have the same targets -> (binary)y


model.fit({'main_input1': __x1__, 'main_input2': __x2__, 'auxiliary_input' : __aux_in__}, {'main_output': __y__, 'auxiliary_output': __y__}, 
              epochs=1000, 
              batch_size=__, 
              validation_split=0.1, 
              callbacks=[____])

I don't know how much this can help since I don't have your data so I can't try. Nevertheless this is my best shot.
I didn't run the above code for obvious reasons.

Manngo
  • 829
  • 7
  • 24
  • I am working on longitudinal medical data and i am trying to understand what u have done. The two concatenated lstm layers picks up two different set of inputs. Am i right ? – Naveen Gabriel Dec 28 '19 at 12:53
  • Yes, x1 and x2 in my wording. – Manngo Jan 02 '20 at 21:12
  • @Manngo hi, I have also to merge time series with non-series data to predict a meteorological variable at different locations (differentiated by non-series data). Would that be possible to share what you did in this regard ? length of time series data differ at different locations in my case. – Basilique Aug 03 '20 at 14:16
  • @Basilique You mean multiple predictions from 1 model, one for each location? For different length of time series you can look perhaps into PLSTM that supports variable sampling but over the same time window. – Manngo Aug 11 '20 at 16:04
  • @Manngo I have two time dependent features T, P and two non-time-dependent variables S and D. The target is also a time dependent variable Q. I would like to have a global model that is trained based on the information from all my 500 stations instead of training 500 local individual models. I would like the global model having two branches: an upstream branch to which I feed the non-time series variables and then a downstream branch to which I feed time series of the 500 locations. I used `generator` for my local models. I don't know how to combine generators with `embedding` layers. – Basilique Aug 12 '20 at 07:14
  • @Basilique, I'm happy to further discuss but could you please open a new topic for this. Not here but on Cross Validated so that we can discuss further on the proper place? Once opened a new topic please also add a comment and quote me in there so that I get the email notification. – Manngo Aug 13 '20 at 21:13
  • @Manngo very grateful, I'll do this. – Basilique Aug 14 '20 at 13:44
  • @Manngo I opened the topic on Cross Validated as per your advice. I realized that my comment below the opened topic was deleted. https://stats.stackexchange.com/questions/483230/lstm-model-in-keras-r-with-time-dependent-and-not-time-dependent-branches-of-i – Basilique Aug 16 '20 at 17:17
0

I found answers from https://datascience.stackexchange.com/questions/17099/adding-features-to-time-series-model-lstm Mr.Philippe Remy wrote a library to condition on auxiliary inputs. I used his library and it's very helpful.

# 10 stations
# 365 days
# 3 continuous variables A and B => C is target.
# 2 conditions dim=5 and dim=1. First cond is one-hot. Second is continuous.
import numpy as np
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential

from cond_rnn import ConditionalRNN

stations = 10  # 10 stations.
time_steps = 365  # 365 days.
continuous_variables_per_station = 3  # A,B,C where C is the target.
condition_variables_per_station = 2  # 2 variables of dim 5 and 1.
condition_dim_1 = 5
condition_dim_2 = 1

np.random.seed(123)
continuous_data = np.random.uniform(size=(stations, time_steps, continuous_variables_per_station))
condition_data_1 = np.zeros(shape=(stations, condition_dim_1))
condition_data_1[:, 0] = 1  # dummy.
condition_data_2 = np.random.uniform(size=(stations, condition_dim_2))

window = 50  # we split series in 50 days (look-back window)

x, y, c1, c2 = [], [], [], []
for i in range(window, continuous_data.shape[1]):
    x.append(continuous_data[:, i - window:i])
    y.append(continuous_data[:, i])
    c1.append(condition_data_1)  # just replicate.
    c2.append(condition_data_2)  # just replicate.

# now we have (batch_dim, station_dim, time_steps, input_dim).
x = np.array(x)
y = np.array(y)
c1 = np.array(c1)
c2 = np.array(c2)

print(x.shape, y.shape, c1.shape, c2.shape)

# let's collapse the station_dim in the batch_dim.
x = np.reshape(x, [-1, window, x.shape[-1]])
y = np.reshape(y, [-1, y.shape[-1]])
c1 = np.reshape(c1, [-1, c1.shape[-1]])
c2 = np.reshape(c2, [-1, c2.shape[-1]])

print(x.shape, y.shape, c1.shape, c2.shape)

model = Sequential(layers=[
    ConditionalRNN(10, cell='GRU'),  # num_cells = 10
    Dense(units=1, activation='linear')  # regression problem.
])

model.compile(optimizer='adam', loss='mse')
model.fit(x=[x, c1, c2], y=y, epochs=2, validation_split=0.2)
Arj184cm
  • 31
  • 3