Keras with tensorflow backend---MemoryError

Question

I am trying to follow this tutorial to learn a bit about deep learning with keras, however I keep getting MemoryError. Can you please point out what is causing it and how to take care of it?

Here is the code:

import numpy as np
from keras import models, regularizers, layers
from keras.datasets import imdb

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

def vectorize_sequences(sequences, dimension=10000):
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
        results[i, sequence] = 1.
    return results


x_train = vectorize_sequences(train_data)

Here is the traceback (line number doesn't match the line number from the code mentioned above)

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/home/uttam/pycharm-2018.2.4/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/home/uttam/pycharm-2018.2.4/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 33, in <module>
    x_train = vectorize_sequences(train_data)
  File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 27, in vectorize_sequences
    results = np.zeros((len(sequences), dimension))
MemoryError

"Probably"? Pls include the full error trace - and if you are correct, arguably the largest part of your code is irrelevant to the issue and should be removed. — desertnaut, Nov 07 '18 at 14:27
So, all the code below `x_train = vectorize_sequences(train_data)` is irrelevant to the problem (it is never executed) - I am removing it, and keep it in mind for the future... — desertnaut, Nov 07 '18 at 16:42
A related question: https://stackoverflow.com/questions/68422410/standard-implementation-of-vectorize-sequences — zabop, Jul 17 '21 at 16:55

score 1 · Answer 1 · answered Nov 07 '18 at 14:29

1

Yes, you are correct. The problem does arise from vectorize_sequences.

You should do that logic in batches (with slicing data like for partial_x_train) or use generators (here is a good explanation and example).

I hope this helps :)

answered Nov 07 '18 at 14:29

Novak

2,143
1
12
22

1

How do you know that the problem is in vectorize_sequences without the full traceback from the OP? – Dr. Snoopy Nov 07 '18 at 14:55
1

I've seen that tutorial – Novak Nov 07 '18 at 14:57
1

That doesn't really answer what I asked, you could say its most likely that part, but being 100% sure of it without the additional information can be misleading. – Dr. Snoopy Nov 07 '18 at 14:59
@Novak so I just make it a generator function instead of a normal function by replacing return with yield? It gives me TypeError: 'generator' object is not subscriptable, for the line x_val = x_train[:10000] – MessitÖzil Nov 07 '18 at 16:19
Yes. Generators Are evaluated lazily, SP you cannot slice them. Thes take as much Data as they need. – Novak Nov 07 '18 at 17:14
@Novak but it gave me TypeError: 'generator' object is not subscriptable, for the line x_val = x_train[:10000]. Can you please show me how can I change the function? Thanks – MessitÖzil Nov 08 '18 at 09:36

Keras with tensorflow backend---MemoryError

1 Answers1

Linked