unwrap a generator to use inside model.fit

Question

I have exactly the same problem as that addressed in this post: I cannot use generator in the training input of model.fit in keras, so I should unwrap it. The proposed solution:

from platform import python_version_tuple

if python_version_tuple()[0] == '3':
    xrange = range
    izip = zip
    imap = map
else:
    from itertools import izip, imap

import numpy as np

# ..
# other code as in question
# ..

x, y = izip(*(validation_seq[i] for i in xrange(len(validation_seq))))
x_val, y_val = np.vstack(x), np.vstack(y)

is what I am exactly looking for. The problem is that it works for the ImageDataGenerator() of the initial question, but it does not for my generator that is as follows:

def generator(data, L, D, i_min, i_max, shuffle=False, batch_size=16, step=1):
  if i_max is None:
     i_max = len(data) - D - 1
  i = i_min + L
  while 1:
     if shuffle:
        rows = np.random.randint(i_min + L, i_max, size=batch_size)
     else:
        if i + batch_size >= i_max:
           i = i_min + L
        rows = np.arange(i, min(i + batch_size, i_max))
        i += len(rows)
     samples = np.zeros((len(rows), L // step, data.shape[-1]))
     targets = np.zeros((len(rows),))
     for j, row in enumerate(rows):
        indices = range(rows[j] - L, rows[j], step)
        samples[j] = data[indices]
        targets[j] = data[rows[j] + D][3]  # where is Q in your data
     yield samples, targets


data = np.random.standard_normal([256,4])
generator = generator(data=data, L=8, D=1, i_min=0, i_max=255, shuffle=False, batch_size=16, step=1)

When I execute izip(*(generator[i] for i in xrange(len(generator)))), I got this error: object of type 'generator' has no len().

I have already tried to replace xrange(len(generator)) by len(list(generator)), enumerate(generator), and none of them work. How could I fix this problem? Thank you.

PS: I am using python 3.8 on osx 10.13.6.

UPDATE: based on the answer by @couka I tried to make the class generator, but it still does not work.

class batch_gen:
  def __init__(self, data, L, D, min_index, max_index, shuffle, batch_size, step):
     self.data = data
     self.L = L
     self.D = D
     self.min_index = min_index
     self.max_index = max_index
     self.shuffle = shuffle
     self.batch_size = batch_size
     self.step = step
 
  def __iter__(self):
     if self.max_index is None:
        self.max_index = len(self.data) - self.D - 1
     i = self.min_index + self.L
     while 1:
        if self.shuffle:
           rows = np.random.randint(self.min_index + self.L, self.max_index, size=self.batch_size)
        else:
           if i + self.batch_size >= self.max_index:
              i = self.min_index + self.L
           rows = np.arange(i, min(i + self.batch_size, self.max_index))
           i += len(rows)
        samples = np.zeros((len(rows), self.L // self.step, self.data.shape[-1]))
        targets = np.zeros((len(rows),))
        for j, row in enumerate(rows):
           indices = range(rows[j] - self.L, rows[j], self.step)
           samples[j] = self.data[indices]
           targets[j] = self.data[rows[j] + self.D][3]  # where is Q in your data
        yield samples, targets
 
  def __len__(self):
     return int(math.floor(len(self.data) / float(self.batch_size)))

when I use:

gen_tr = batch_gen(data=data, L=L, D=D,
                  min_index=min(ind_tr), max_index=max(ind_tr),
                  shuffle=True, step=step, batch_size=batch_size)

and I got this error: TypeError: 'batch_gen' object is not subscriptable.

Please put the _full_ error message including the call stack traceback into your question. Also please update the code in your question so it is a [mre] - anyone should be able to paste your code into a file and _without adding anything_ run it to see the same problem as you. — DisappointedByUnaccountableMod, Jan 01 '21 at 17:16
The error is exactly what it says. batch_gen isn't subscriptable. You don't implement `__getitem__` in your generator class, and there is no reason you should. You can iterate over the items directly instead of by subscript. `izip(*(item for item in gen_tr))` should work. — Adam Acosta, Jan 01 '21 at 18:23

score 1 · Answer 1 · answered Jan 01 '21 at 12:43

1

First of all, generator should be a class, not a method.

Then, object of type 'generator' has no len() means your class generator has no method __len__(self). So you need to add that. Afaik that method should return the number of batches in the dataset.

It might look something like this:

def __len__(self):
    return int(math.floor(len(self.data) / float(self.batch_size)))

answered Jan 01 '21 at 12:43

couka

1,361
9
16

Thank you for your answer. Would you please see the question's UPDATE? I still can't get it to work. Many thanks. – Basilique Jan 01 '21 at 17:12

unwrap a generator to use inside model.fit

1 Answers1