How to build deep learning model that picks words from serval distinct bags and forms a meaningful sentence

Question

Image of Bags and how to choose from them

Imagine I have 10 bags,Ordered one after other.ie Bag 1 , Bag 2 ......... Bag n.

Each bag has distinct set of words.

In order to understand what a bag is, Consider we have a vocabulary of 10,000 words. The first bag contains words Hello , India , Manager.

ie Bag 1 will have 1's at the words index present in the bag. ex:Bag 1 will be of size 10000*1 if Hello's index was 1 India's index was 2 and Manager's was 4 It will be [0 , 1, 1, 0 , 1 ,0,0,0,0.........]

*I dont have a model yet. *I'm thinking to use story books,But its still kind of abstract for me.

A word has to chosen from each bag and assigned a number word 1(word from bag 1) word 2(word from bag 2) and they must form a MEANINGFULL sentence in their numerical order.!

This question is not clear to me. What are these bags, how are they composed and shaped (indices? one hot? another?)? Are they different dictionaries? Do they share a common dictionary? Do you have already a model? Do you have training data? Etc. — Daniel Möller, Apr 18 '18 at 13:08
It's better, but if it's one-hot, the bag should be 2D, (10,dic_size). — Daniel Möller, Apr 18 '18 at 16:16
each bag will be a 1D array,I think he used the term 1 hot endcoding wrongly. It will have 1's at the word indexes present in the bag — Swamy, Apr 18 '18 at 16:18
You can't have 1 if it's not one-hot. Unless you have the entire dic in a bag, but then you don't have word order. — Daniel Möller, Apr 18 '18 at 16:29
Do you mean the bag will have word indices (not 1)? Ex: `bag1 = [13,34,21,42,1,23,4]`, being these numbers the index of the word in the dictionary. — Daniel Möller, Apr 18 '18 at 16:30
Imagine bag has 10 words. From my dictionary which matches words to numbers,I will find the number associated with that words.Then i will go and place one in that position i just got from my dictionary in the bag array. EX if first word has number allocated as 10 then at position 10 of the bag1 array i will put one,If second word has 32 has its number , then i will put 1 in the 32nd position of the bag array. He has mentioned bag only contain unique words . — Shashi Tunga, Apr 18 '18 at 17:01

score 4 · Answer 1 · answered Apr 24 '18 at 14:21

First, we need a way that the computer can recognise a word otherwise it cannot pick the correct one. That means at this stage, we need to decide what we are teaching the computer to begin with (ie what is a verb, noun, grammar) but I will assume we will dump a dictionary into it and give no information except the words themselves.

So that the computer can compute what sentences are, we need to convert them to numbers (one way would be to work alphabetically starting at 1, using them as keys for a dictionary (digital this time(!)) and the word as the value). Now we can apply the same linear algebra techniques to this problem as any other problem.

So we need to make generations of matrices of weights to multiply into the keys of the dictionary, then remove all the weights beyond the range of dictionary keys, the rest can be used to get the value in the dictionary and make a sentence. Optionally, you can also use a threshold value to take off of all the outputs of the matrix multiplication

Now for the hard part: learning. Once you have a few (say 100) matrices, we need to "breed" the best ones (this is where human intervention is needed) and you need to pick the 50 most meaningful sentences (might be hard at first) and use them to base your next 100 of (easiest way would be to weight the 50 matrices randomly for a weighted mean 100 times).

And the boring bit, keep running the generations over and over until you get to a point where your sentences are meaningful most of the time (of course there is no guarantee that it will always be meaningful but that's the nature of ANN's)

If you find it doesn't work, you can use more layers (more matrices) and/or I recently heard of a different technique that dynamically changed the network but I can't really help with that.

Daniel Möller · Accepted Answer · 2018-07-11T17:49:36.170

3

Have a database with thousands/millions of valid sentences.

Create a dictionary where each word represents a number (reserve 0 for "nothing", 1 for "start of sentence" and 2 for "end of sentence").

word_dic = { "_nothing_": 0, "_start_": 1, "_end_": 2, "word1": 3, "word2": 4, ...}
reverse_dic = {v:k for k,v in word_dic.items()}

Remember to add "_start_" and "_end_" at the beginning and end of all sentences in the database, and "_nothing_" after the end to complete the desired length capable of containing all sentences. (Ideally, work with sentences with 10 or less words, so your model wont't try to create bigger sentences).

Transform all your sentences into sequences of indices:

 #supposing you have an array of shape (sentences, length) as string:
 indices = []
 for word in database.reshape((-1,)):
     indices.append(word_dic[word])
 indices = np.array(indices).reshape((sentences,length))

Transform this into categorical words with the keras function to_categorical()

 cat_sentences = to_categorical(indices) #shape (sentences,length,dictionary_size)

Hint: keras has lots of useful text preprocessing functions here.

Separate training input and output data:

#input is the sentences except for the last word
x_train = cat_sentences[:,:-1,:]
y_train = cat_sentences[:,1:,:]

Let's create an LSTM based model that will predict the next words from the previous words:

model = Sequential()
model.add(LSTM(dontKnow,return_sequences=True,input_shape=(None,dictionary_size)))
model.add(.....)
model.add(LSTM(dictionary_size,return_sequences=True,activation='sigmoid')) 
   #or a Dense(dictionary_size,activation='sigmoid')

Compile and fit this model with x_train and y_train:

model.compile(....)
model.fit(x_train,y_train,....)

Create an identical model using stateful=True in all LSTM layers:

newModel = ......

Transfer the weights from the trained model:

newModel.set_weights(model.get_weights())

Create your bags in a categorical way, shape (10, dictionary_size).

Use the model to predict one word from the _start_ word.

#reset the states of the stateful model before you start a 10 word prediction:
newModel.reset_states()

firstWord = newModel.predict(startWord) #startword is shaped as (1,1,dictionary_size)

The firstWord will be a vector with size dictionary_size telling (sort of) the probabilities of each existing word. Compare to the words in the bag. You can choose the highest probability, or use some random selecting if the probabilities of other words in the bag are also good.

#example taking the most probable word:
firstWord = np.array(firstWord == firstWord.max(), dtype=np.float32)

Do the same again, but now input firstWord in the model:

secondWord = newModel.predict(firstWord) #respect the shapes

Repeat the process until you get a sentence. Notice that you may find _end_ before the 10 words in the bag are satisfied. You may decide to finish the process with a shorter sentence then, especially if other word probabilities are low.

edited Jul 11 '18 at 17:49

answered Apr 24 '18 at 15:28

Daniel Möller

84,878
18
192
214

Good answer. What is the importance of setting state full to true in the second model? – Vikas NS Jul 11 '18 at 16:59
3

Each batch will be considered a new step of the "same" sequence. – Daniel Möller Jul 11 '18 at 17:14
Got it, thanks.One more doubt, what if there is no constraint on order of picking? If there are 5 bags then there are 120 different orders by which I can pick. Is there any way other than trying all the permutations and picking the order with maximum probability ? – Vikas NS Jul 11 '18 at 17:37
1

If you're using recurrent layers, you should keep the order. – Daniel Möller Jul 11 '18 at 17:39
No, I meant while testing(in real life). While training I’ll maintain order. – Vikas NS Jul 11 '18 at 17:42
>Create your bags in a categorical way, shape (10, dictionary_size). Could you please give more details on this line? – Vikas NS Jul 11 '18 at 17:48
Can I use Glove vectors to represent the words? Then the output will be just 300d instead of (dictionary size)d. I'll just take consine similarity and choose the one with the highest value.? – Jul 11 '18 at 17:57
1

It's good as an input, but it's very hard to take a globe vector and identify which word it is (because it's a continuous, not discrete vector) – Daniel Möller Jul 11 '18 at 18:01
yes, but i'll maintain a matrix of glove representation of all the words and will multiply the output vector with that matrix and the word with maximum cosine similarity will be chosen. What do you think? – Jul 11 '18 at 18:04
It seems feasible.... I don't know if it would be better or not. – Daniel Möller Jul 11 '18 at 18:06
Is there a competition going on? Everyone came here today :D – Daniel Möller Jul 11 '18 at 18:06
@vikasns, that's 10 one-hot vectors, each representing one word of the bag. – Daniel Möller Jul 11 '18 at 18:08
Its our class project, we are a team. – Jul 11 '18 at 18:11
I have started to work on your approach, I’m using free story books in guttenberg for sentences. I’m finding it hard to get sentences between 5 to 10 words. Most of them are longer sentences. Only 10% fall in the required category. Can I break these sentences and use them? Any suggestions? – Vikas NS Jul 21 '18 at 18:44

How to build deep learning model that picks words from serval distinct bags and forms a meaningful sentence

2 Answers2