Do we need a GPU system to train an deep learning model?

Question

I have created an encoder-decoder model with pre-trained 100D glove embedding, to create an abstractive text summarizer. The data set has 4300 article, its summary data. Vocabulary size is 48549 for articles and 19130 for summary. Total memory size of input, output variables = 7.5Gb

Following is the basic encoder-decoder model:

latent_dim = 1024
encoder_inputs = Input(shape=(max_x_len,))
emb1 = Embedding(len(x_voc), 100, weights=[x_voc], trainable = False)(encoder_inputs)

encoder = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder(emb1)

decoder_inputs = Input(shape=(None,))
emb2 = Embedding(len(y_voc), 100, weights=[y_voc], trainable = False)(decoder_inputs)

decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs0, _, _ = decoder_lstm(emb2, initial_state=[state_h, state_c])

decoder_dense = Dense(len(y_voc), activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs0)

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

When I train on whole data the spyder consumes 99% of the memory and system stops.

My System configuration are as follows:

OS - windows 10 (64-bit)
Ram - 8Gb
Processor - Intel(R) Core(TM) i5-3470
ROM - 300Gb

Further I want to -

Add more data and layers to the model
Add attention layer
Implement Bert

Kindly suggest a solution or an suitable system configuration.

Decrease batch size, write a data loader it load data on feed and releases from memory when no needs. — mcemilg, Sep 11 '19 at 10:27
then training speed is also sacrificed. I want to increase the training speed as well. Also, batch size is just 32. — hR 312, Sep 11 '19 at 10:28
You will but not as huge as you think. You need to sacrifice from somewhere. — mcemilg, Sep 11 '19 at 10:30
I can upgrade the system. Kindly suggest what configuration will be enough to do the same, and maybe to train an unsupervised image classifier with 1lakh images. — hR 312, Sep 11 '19 at 10:31
I guess this is off-topic at SO.com anyway. But to get a decent answer anywhere you really need to give serious specification, not just "I want to increase the training speed as well" etc. — dedObed, Sep 11 '19 at 10:33
I am sorry, I don't know what specifications other then those mentioned in the question is required? Kindly let me know I'll update my question. — hR 312, Sep 11 '19 at 10:38
Its not possible for us to recommend you particular system configurations, I can just say 8 GB RAM is not enough, try with a system with 32 GB of RAM. — Dr. Snoopy, Sep 11 '19 at 11:43
Thanks that's the kind of answer I was expecting. What other hardware changes can effect the performance? — hR 312, Sep 11 '19 at 11:45
sometimes the hard disk can influence the performance, you should have a SSD, the moment the memory is full, your computer use the hard disk for calculation. — PV8, Sep 11 '19 at 11:57
Using a generator for loading data will hardly affect your training speed. And your data is seriously consuming your memory. Using a GPU for LSTM is not really a great advantage, it might be a little faster, but not as great as with convolutional or dense networks. — Daniel Möller, Sep 11 '19 at 13:27
If scientist prefere now to use GPU instead of CPU is above all for matter of cost. A CPU is more expensive than a GPU. If you are interested in GPU, I found this [page](https://timdettmers.com/2019/04/03/which-gpu-for-deep-learning/) some months ago. — AvyWam, Sep 11 '19 at 14:36

score 0 · Answer 1 · answered Jan 11 '20 at 09:49

This code repo, contains multiple implementations for text summarization, it optimizes the learning parameters to easily and efficiently run on google colab, I think it could prove helpful.

It also discuses into detail how these models are built in a blog series.

Hope this is helpful.

score -1 · Answer 2 · answered Sep 11 '19 at 17:17

There is difference between executing Deep learning program and simple ML program. In deep lering we are actually working on tensors(means smallest vector) so for processing deep model we need some processing unit which are efficient to work on tensors. There may be some dedicated system which works on neural network program. So for executing neural network model we need GPU or TPU for processing data which passed from one layer of neurons to others. CPU may work but CPU is not dedicated to work only for Neural network model. CPU was assigned to work on whole system basically computation program execute faster on CPU. Hope This will help you.

Do we need a GPU system to train an deep learning model?

2 Answers2