1

If I have a string, say "abc" and target of that string in reverse, say "cba".

Can a neural network, in particular an encoder-decoder model, learn this mapping? If so, what is the best model to accomplish this.

I ask, as this is a structural translation rather than a simple character mapping as in normal machine translation

Andrew Xia
  • 365
  • 1
  • 11
  • What is the end goal you are trying to achieve?? – ashutosh singh Jun 27 '19 at 17:04
  • this is more for my understanding rather than a real usecase, I want a model that can take an input string and output the input string in reverse. – Andrew Xia Jun 27 '19 at 18:52
  • You've already cited Seq2Seq -- what happened when you tried it? What problems arose? If you haven't tried it yet, then SO is not the right site for the question -- a little too abstract. Please check [Which site?](https://meta.stackexchange.com/questions/129598/which-computer-science-programming-stack-exchange-do-i-post-in) for general issues. – Prune Jun 27 '19 at 19:04
  • I tried and it was able to get roughly 75% accuracy. But, I wanted to understand the intuition behind this. As, if a seq2seq model is able to learn the structure then it should be performing almost perfectly but if it is not able to learn the structure, it should not be able to generalize on unseen data at all – Andrew Xia Jun 27 '19 at 19:41

2 Answers2

2

I doubt that a NN will learn the abstract structural transformation. Since the string is of unbounded input length, the finite NN won't have the info necessary. NLP processes generally work with identifying small blocks and simple context-sensitive shifts. I don't think they'd identify the end-to-end swaps needed.

However, I expect that an image processor, adapted to a single dimension, would learn this quite quickly. Some can learn how to rotate a sub-image.

Prune
  • 76,765
  • 14
  • 60
  • 81
  • just to check, would you suggest a 1d convolution feeding into a RNN? – Andrew Xia Jun 27 '19 at 19:43
  • Yes, that's what I had in mind, or perhaps a CNN to look for the abstract rotation. The model will need something that can act as a stack, however. – Prune Jun 27 '19 at 19:55
2

If your network is an old-fashioned encoder-decoder model (without attention), then, as @Prune said, it has memory bottleneck (encoder dimensionality). Thus, such a network cannot learn to reverse strings of arbitrary size. However, you can train such an RNN to reverse strings of limited size. For example, the following toy seq2seq LSTM is able to reverse sequences of digits with length up to 10. Here is how you train it:

from keras.models import Model
from keras.layers import Input, LSTM, Dense, Embedding
import numpy as np

emb_dim = 20
latent_dim = 100  # Latent dimensionality of the encoding space.
vocab_size = 12 # digits 0-9, 10 is for start token, 11 for end token

encoder_inputs = Input(shape=(None, ), name='enc_inp')
common_emb = Embedding(input_dim=vocab_size, output_dim=emb_dim)
encoder_emb = common_emb(encoder_inputs)
encoder = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_emb)
encoder_states = [state_h, state_c]

decoder_inputs = Input(shape=(None,), name='dec_inp')
decoder_emb = common_emb(decoder_inputs)
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_emb, initial_state=encoder_states)
decoder_dense = Dense(vocab_size, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

def generate_batch(length=4, batch_size=64):
    x = np.random.randint(low=0, high=10, size=(batch_size, length))
    y = x[:, ::-1]
    start = np.ones((batch_size, 1), dtype=int) * 10
    end = np.ones((batch_size, 1), dtype=int) * 11
    enc_x = np.concatenate([start, x], axis=1)
    dec_x = np.concatenate([start, y], axis=1)
    dec_y = np.concatenate([y, end], axis=1)
    dec_y_onehot = np.zeros(shape=(batch_size, length+1, vocab_size), dtype=int)
    for row in range(batch_size):
        for col in range(length+1):
            dec_y_onehot[row, col, dec_y[row, col]] = 1
    return [enc_x, dec_x], dec_y_onehot

def generate_batches(batch_size=64, max_length=10):
    while True:
        length = np.random.randint(low=1, high=max_length)
        yield generate_batch(length=length, batch_size=batch_size)

model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['categorical_accuracy'])
model.fit_generator(generate_batches(), steps_per_epoch=1000, epochs=20)

Now you can apply it to reverse a sequence (my decoder is very inefficient, but it does illustrate the principle)

input_seq = np.array([[10, 2, 1, 2, 8, 5, 0, 6]])
result = np.array([[10]])
next_digit = -1
for i in range(100):
    next_digit = model.predict([input_seq, result])[0][-1].argmax()
    if next_digit == 11:
        break
    result = np.concatenate([result, [[next_digit]]], axis=1)
print(result[0][1:])

Hoorray, it prints [6 0 5 8 2 1 2] ! Generally, you can think of such a model as a weird autoencoder (with a reversal side-effect), and choose architecture and training procedure suitable for autoencoders. And there is quite a vast literature about text autoencoders.

Moreover, if you make an encoder-decoder model with attention, then, it will have no memory bottleneck, so, in principle, it is possible to reverse a sequence of any length with a neural network. However, attention requires quadratic computational time, so in practice even neural networks with attention will be very inefficient for long sequences.

David Dale
  • 10,958
  • 44
  • 73