1

I have created an encoder-decoder model with pre-trained 100D glove embedding, to create an abstractive text summarizer. The data set has 4300 article, its summary data. Vocabulary size is 48549 for articles and 19130 for summary. Total memory size of input, output variables = 7.5Gb

Following is the basic encoder-decoder model:

latent_dim = 1024
encoder_inputs = Input(shape=(max_x_len,))
emb1 = Embedding(len(x_voc), 100, weights=[x_voc], trainable = False)(encoder_inputs)

encoder = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder(emb1)

decoder_inputs = Input(shape=(None,))
emb2 = Embedding(len(y_voc), 100, weights=[y_voc], trainable = False)(decoder_inputs)

decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs0, _, _ = decoder_lstm(emb2, initial_state=[state_h, state_c])

decoder_dense = Dense(len(y_voc), activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs0)

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

When I train on whole data the spyder consumes 99% of the memory and system stops.

My System configuration are as follows:

OS - windows 10 (64-bit)
Ram - 8Gb
Processor - Intel(R) Core(TM) i5-3470
ROM - 300Gb

Further I want to -

  • Add more data and layers to the model
  • Add attention layer
  • Implement Bert

Kindly suggest a solution or an suitable system configuration.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
hR 312
  • 824
  • 1
  • 9
  • 22
  • Try using Google Colab or Kaggle Kernels. – Kshitij Saxena Sep 11 '19 at 10:24
  • Colab also goes out of memory. – hR 312 Sep 11 '19 at 10:25
  • 2
    Decrease batch size, write a data loader it load data on feed and releases from memory when no needs. – mcemilg Sep 11 '19 at 10:27
  • then training speed is also sacrificed. I want to increase the training speed as well. Also, batch size is just 32. – hR 312 Sep 11 '19 at 10:28
  • You will but not as huge as you think. You need to sacrifice from somewhere. – mcemilg Sep 11 '19 at 10:30
  • I can upgrade the system. Kindly suggest what configuration will be enough to do the same, and maybe to train an unsupervised image classifier with 1lakh images. – hR 312 Sep 11 '19 at 10:31
  • Why I got a downvote? – hR 312 Sep 11 '19 at 10:33
  • I guess this is off-topic at SO.com anyway. But to get a decent answer anywhere you really need to give serious specification, not just "I want to increase the training speed as well" etc. – dedObed Sep 11 '19 at 10:33
  • I am sorry, I don't know what specifications other then those mentioned in the question is required? Kindly let me know I'll update my question. – hR 312 Sep 11 '19 at 10:38
  • 2
    Its not possible for us to recommend you particular system configurations, I can just say 8 GB RAM is not enough, try with a system with 32 GB of RAM. – Dr. Snoopy Sep 11 '19 at 11:43
  • Thanks that's the kind of answer I was expecting. What other hardware changes can effect the performance? – hR 312 Sep 11 '19 at 11:45
  • sometimes the hard disk can influence the performance, you should have a SSD, the moment the memory is full, your computer use the hard disk for calculation. – PV8 Sep 11 '19 at 11:57
  • Will upgrading the processor help? – hR 312 Sep 11 '19 at 12:45
  • Using a generator for loading data will hardly affect your training speed. And your data is seriously consuming your memory. Using a GPU for LSTM is not really a great advantage, it might be a little faster, but not as great as with convolutional or dense networks. – Daniel Möller Sep 11 '19 at 13:27
  • If scientist prefere now to use GPU instead of CPU is above all for matter of cost. A CPU is more expensive than a GPU. If you are interested in GPU, I found this [page](https://timdettmers.com/2019/04/03/which-gpu-for-deep-learning/) some months ago. – AvyWam Sep 11 '19 at 14:36

2 Answers2

0

This code repo, contains multiple implementations for text summarization, it optimizes the learning parameters to easily and efficiently run on google colab, I think it could prove helpful.

It also discuses into detail how these models are built in a blog series.

Hope this is helpful.

amr zaki
  • 66
  • 1
  • 6
-1

There is difference between executing Deep learning program and simple ML program. In deep lering we are actually working on tensors(means smallest vector) so for processing deep model we need some processing unit which are efficient to work on tensors. There may be some dedicated system which works on neural network program. So for executing neural network model we need GPU or TPU for processing data which passed from one layer of neurons to others. CPU may work but CPU is not dedicated to work only for Neural network model. CPU was assigned to work on whole system basically computation program execute faster on CPU. Hope This will help you.