-2

Image of Bags and how to choose from them

Imagine I have 10 bags,Ordered one after other.ie Bag 1 , Bag 2 ......... Bag n.

Each bag has distinct set of words.

In order to understand what a bag is, Consider we have a vocabulary of 10,000 words. The first bag contains words Hello , India , Manager.

ie Bag 1 will have 1's at the words index present in the bag. ex:Bag 1 will be of size 10000*1 if Hello's index was 1 India's index was 2 and Manager's was 4 It will be [0 , 1, 1, 0 , 1 ,0,0,0,0.........]

*I dont have a model yet. *I'm thinking to use story books,But its still kind of abstract for me.

A word has to chosen from each bag and assigned a number word 1(word from bag 1) word 2(word from bag 2) and they must form a MEANINGFULL sentence in their numerical order.!

Swamy
  • 171
  • 1
  • 9

2 Answers2

4

First, we need a way that the computer can recognise a word otherwise it cannot pick the correct one. That means at this stage, we need to decide what we are teaching the computer to begin with (ie what is a verb, noun, grammar) but I will assume we will dump a dictionary into it and give no information except the words themselves.

So that the computer can compute what sentences are, we need to convert them to numbers (one way would be to work alphabetically starting at 1, using them as keys for a dictionary (digital this time(!)) and the word as the value). Now we can apply the same linear algebra techniques to this problem as any other problem.

So we need to make generations of matrices of weights to multiply into the keys of the dictionary, then remove all the weights beyond the range of dictionary keys, the rest can be used to get the value in the dictionary and make a sentence. Optionally, you can also use a threshold value to take off of all the outputs of the matrix multiplication

Now for the hard part: learning. Once you have a few (say 100) matrices, we need to "breed" the best ones (this is where human intervention is needed) and you need to pick the 50 most meaningful sentences (might be hard at first) and use them to base your next 100 of (easiest way would be to weight the 50 matrices randomly for a weighted mean 100 times).

And the boring bit, keep running the generations over and over until you get to a point where your sentences are meaningful most of the time (of course there is no guarantee that it will always be meaningful but that's the nature of ANN's)

If you find it doesn't work, you can use more layers (more matrices) and/or I recently heard of a different technique that dynamically changed the network but I can't really help with that.

Damien W
  • 101
  • 1
  • 4
3

Have a database with thousands/millions of valid sentences.

Create a dictionary where each word represents a number (reserve 0 for "nothing", 1 for "start of sentence" and 2 for "end of sentence").

word_dic = { "_nothing_": 0, "_start_": 1, "_end_": 2, "word1": 3, "word2": 4, ...}
reverse_dic = {v:k for k,v in word_dic.items()}

Remember to add "_start_" and "_end_" at the beginning and end of all sentences in the database, and "_nothing_" after the end to complete the desired length capable of containing all sentences. (Ideally, work with sentences with 10 or less words, so your model wont't try to create bigger sentences).

Transform all your sentences into sequences of indices:

 #supposing you have an array of shape (sentences, length) as string:
 indices = []
 for word in database.reshape((-1,)):
     indices.append(word_dic[word])
 indices = np.array(indices).reshape((sentences,length))

Transform this into categorical words with the keras function to_categorical()

 cat_sentences = to_categorical(indices) #shape (sentences,length,dictionary_size)

Hint: keras has lots of useful text preprocessing functions here.

Separate training input and output data:

#input is the sentences except for the last word
x_train = cat_sentences[:,:-1,:]
y_train = cat_sentences[:,1:,:]

Let's create an LSTM based model that will predict the next words from the previous words:

model = Sequential()
model.add(LSTM(dontKnow,return_sequences=True,input_shape=(None,dictionary_size)))
model.add(.....)
model.add(LSTM(dictionary_size,return_sequences=True,activation='sigmoid')) 
   #or a Dense(dictionary_size,activation='sigmoid')

Compile and fit this model with x_train and y_train:

model.compile(....)
model.fit(x_train,y_train,....)

Create an identical model using stateful=True in all LSTM layers:

newModel = ...... 

Transfer the weights from the trained model:

newModel.set_weights(model.get_weights())

Create your bags in a categorical way, shape (10, dictionary_size).

Use the model to predict one word from the _start_ word.

#reset the states of the stateful model before you start a 10 word prediction:
newModel.reset_states()

firstWord = newModel.predict(startWord) #startword is shaped as (1,1,dictionary_size)

The firstWord will be a vector with size dictionary_size telling (sort of) the probabilities of each existing word. Compare to the words in the bag. You can choose the highest probability, or use some random selecting if the probabilities of other words in the bag are also good.

#example taking the most probable word:
firstWord = np.array(firstWord == firstWord.max(), dtype=np.float32)

Do the same again, but now input firstWord in the model:

secondWord = newModel.predict(firstWord) #respect the shapes

Repeat the process until you get a sentence. Notice that you may find _end_ before the 10 words in the bag are satisfied. You may decide to finish the process with a shorter sentence then, especially if other word probabilities are low.

Daniel Möller
  • 84,878
  • 18
  • 192
  • 214
  • Good answer. What is the importance of setting state full to true in the second model? – Vikas NS Jul 11 '18 at 16:59
  • 3
    Each batch will be considered a new step of the "same" sequence. – Daniel Möller Jul 11 '18 at 17:14
  • Got it, thanks.One more doubt, what if there is no constraint on order of picking? If there are 5 bags then there are 120 different orders by which I can pick. Is there any way other than trying all the permutations and picking the order with maximum probability ? – Vikas NS Jul 11 '18 at 17:37
  • 1
    If you're using recurrent layers, you should keep the order. – Daniel Möller Jul 11 '18 at 17:39
  • No, I meant while testing(in real life). While training I’ll maintain order. – Vikas NS Jul 11 '18 at 17:42
  • >Create your bags in a categorical way, shape (10, dictionary_size). Could you please give more details on this line? – Vikas NS Jul 11 '18 at 17:48
  • Can I use Glove vectors to represent the words? Then the output will be just 300d instead of (dictionary size)d. I'll just take consine similarity and choose the one with the highest value.? –  Jul 11 '18 at 17:57
  • 1
    It's good as an input, but it's very hard to take a globe vector and identify which word it is (because it's a continuous, not discrete vector) – Daniel Möller Jul 11 '18 at 18:01
  • yes, but i'll maintain a matrix of glove representation of all the words and will multiply the output vector with that matrix and the word with maximum cosine similarity will be chosen. What do you think? –  Jul 11 '18 at 18:04
  • It seems feasible.... I don't know if it would be better or not. – Daniel Möller Jul 11 '18 at 18:06
  • Is there a competition going on? Everyone came here today :D – Daniel Möller Jul 11 '18 at 18:06
  • @vikasns, that's 10 one-hot vectors, each representing one word of the bag. – Daniel Möller Jul 11 '18 at 18:08
  • Its our class project, we are a team. –  Jul 11 '18 at 18:11
  • I have started to work on your approach, I’m using free story books in guttenberg for sentences. I’m finding it hard to get sentences between 5 to 10 words. Most of them are longer sentences. Only 10% fall in the required category. Can I break these sentences and use them? Any suggestions? – Vikas NS Jul 21 '18 at 18:44