As mentioned by @ali_m, that line by itself is a generator expression. This means that none of the elements of X_train
are evaluated until the elements of that generator expression are evaluated. You must be evaluating all of the elements of X_train
and storing them in memory somewhere later in your code, possibly by doing list(X_train)
, appending every element of X_train
to a list, or something similar. This will create a list which is equal in length to your original X_train
before the generator expression, hence causing a memory error if it is too big.
The original X_train
cannot be garbage collected while the generator expression is still being evaluated, so by creating a list of the new X_train
, you are creating two huge lists, which is probably why it runs out of memory.
In this case, you can't use xrange
to make your code more efficient, because it is already a generator expression. The best thing to do would be to look at how X_train
is used later in your code and try to iterate over it (for _ in X_train
) as opposed to making it into a list (list(X_train)
) wherever possible.