0

!!NOT ABLE TO EXPLAIN WITHOU PICTURES!! I have a neural network, which i feed with text news, encoded with BagOfWOrds and labels {0,1}.I classify the text with convolutions. Everything seems to be working, but, in case that i have quoit a little data(20 000 news), the accuracy on train is not converging and accuracy on test is wierd itself(X axis is batches, green is test, blue is train): enter image description here

Then, I visualized the wieghts of every layer in my NN and it suprised and confused me even more(X axis is batches):

These are weigts of Conv layer which is 1st in NN

And

These are weihts of Dense layer whis is pre-last in NN

1. I really cant explain, why weights converges at 20th batch and dont change later, while, according to accuracy, they were supposed to!!

2. And why the test accuracy's behavior is so strange(greeen line) I hope this is just code...

topic_input = lasagne.layers.InputLayer(shape=(None, v_train.shape[1]), input_var=v_t)
embedding = lasagne.layers.EmbeddingLayer(topic_input, input_size=len(token_to_id)+1, output_size=32)
WHAT = lasagne.layers.DimshuffleLayer(embedding, [0,2,1])
conv_1 = lasagne.layers.Conv1DLayer(WHAT, num_filters=15, filter_size=4)
conv_2 = lasagne.layers.Conv1DLayer(conv_1, num_filters=5, filter_size=3)
dense_1 = lasagne.layers.DenseLayer(conv_2, 30)
dense_2 = lasagne.layers.DenseLayer(dense_1, 5)
dense_3 = lasagne.layers.DenseLayer(dense_2, 1, nonlinearity=lasagne.nonlinearities.sigmoid)

weights = lasagne.layers.get_all_params(dense_3,trainable=True)
prediction = lasagne.layers.get_output(dense_3)
loss = lasagne.objectives.binary_crossentropy(prediction, target).mean()
updates = lasagne.updates.adam(loss, weights, learning_rate=0.01)
accuracy = lasagne.objectives.binary_accuracy(prediction, target).mean()

train_func = theano.function([v_t, target], [loss, prediction, accuracy, weights[7]], updates=updates)
acc_func = theano.function([v_t, target], [accuracy, prediction])
  • Apparently, it quickly gets stuck in a minimum. With a little amount of data, it is very likely that the network simply cannot learn anything meaningful. Try 1) initialize it better and 2) play with meta-parameters (learning rate, # layers, # units). – Dmytro Prylipko Dec 19 '16 at 11:55
  • @DmytroPrylipko To ckeck the "local minimum" hypothesis, i varied the learning rate from 1 to 0.01, and the convergence was still the same! SO, there is only a veery small chance, that this is local minimum trap. Are there any other reasons that kill weight's update so quickly, whilst the loss is still big –  Dec 19 '16 at 12:25
  • Well, sometimes the task itself does not make sense because you have too little data, or the data is not pre-processed correctly or there is an error in the network config. There are a million of different reasons why it does not train well. I would suggest reproducing a setup that is known to work well first, and then moving from it into the direction you need: try another data, then modify the model. Thus, you can find out, what is the source of the problem. Also, intialization play an important role. Try maybe to start from weights pre-trained elsewhere. – Dmytro Prylipko Dec 19 '16 at 14:52

0 Answers0