0

I am trying to follow this tutorial to learn a bit about deep learning with keras, however I keep getting MemoryError. Can you please point out what is causing it and how to take care of it?

Here is the code:

import numpy as np
from keras import models, regularizers, layers
from keras.datasets import imdb

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

def vectorize_sequences(sequences, dimension=10000):
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
        results[i, sequence] = 1.
    return results


x_train = vectorize_sequences(train_data)

Here is the traceback (line number doesn't match the line number from the code mentioned above)

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/home/uttam/pycharm-2018.2.4/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/home/uttam/pycharm-2018.2.4/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 33, in <module>
    x_train = vectorize_sequences(train_data)
  File "/home/uttam/PycharmProjects/IMDB/imdb.py", line 27, in vectorize_sequences
    results = np.zeros((len(sequences), dimension))
MemoryError
desertnaut
  • 57,590
  • 26
  • 140
  • 166
MessitÖzil
  • 1,298
  • 4
  • 13
  • 23
  • "Probably"? Pls include the full error trace - and if you are correct, arguably the largest part of your code is irrelevant to the issue and should be removed. – desertnaut Nov 07 '18 at 14:27
  • I have edited the question to add the error trace – MessitÖzil Nov 07 '18 at 16:40
  • So, all the code below `x_train = vectorize_sequences(train_data)` is irrelevant to the problem (it is never executed) - I am removing it, and keep it in mind for the future... – desertnaut Nov 07 '18 at 16:42
  • A related question: https://stackoverflow.com/questions/68422410/standard-implementation-of-vectorize-sequences – zabop Jul 17 '21 at 16:55

1 Answers1

1

Yes, you are correct. The problem does arise from vectorize_sequences.

You should do that logic in batches (with slicing data like for partial_x_train) or use generators (here is a good explanation and example).

I hope this helps :)

Novak
  • 2,143
  • 1
  • 12
  • 22
  • 1
    How do you know that the problem is in vectorize_sequences without the full traceback from the OP? – Dr. Snoopy Nov 07 '18 at 14:55
  • 1
    I've seen that tutorial – Novak Nov 07 '18 at 14:57
  • 1
    That doesn't really answer what I asked, you could say its most likely that part, but being 100% sure of it without the additional information can be misleading. – Dr. Snoopy Nov 07 '18 at 14:59
  • @Novak so I just make it a generator function instead of a normal function by replacing return with yield? It gives me TypeError: 'generator' object is not subscriptable, for the line x_val = x_train[:10000] – MessitÖzil Nov 07 '18 at 16:19
  • Yes. Generators Are evaluated lazily, SP you cannot slice them. Thes take as much Data as they need. – Novak Nov 07 '18 at 17:14
  • @Novak but it gave me TypeError: 'generator' object is not subscriptable, for the line x_val = x_train[:10000]. Can you please show me how can I change the function? Thanks – MessitÖzil Nov 08 '18 at 09:36