I have a specific case where the networks are relatively tiny and for convergence and generalization matters I should maintain small batch sizes (e.g. 256), which leads to hundreds of batches to process per epoch.
Unfortunately, in this scenario batch, loading, and loss calculation becomes a bottleneck (as timeline
tool tells me).
In TensorFlow, you can write something like this to load the data on the GPU:
with tf.device('/gpu:0'):
train_data = tf.constant(train_data_numpy)
But if I pass train_data
to Keras Model.predict
or Model.fit
functions, I get the following error:
keras/engine/training.pyc in predict(self, x, batch_size, verbose)
1515 f = self.predict_function
1516 return self._predict_loop(f, ins,
-> 1517 batch_size=batch_size, verbose=verbose)
1518
1519 def train_on_batch(self, x, y,
keras/engine/training.pyc in _predict_loop(self, f, ins, batch_size, verbose)
1129 if verbose == 1:
1130 progbar = Progbar(target=samples)
-> 1131 batches = _make_batches(samples, batch_size)
1132 index_array = np.arange(samples)
1133 for batch_index, (batch_start, batch_end) in enumerate(batches):
keras/engine/training.pyc in _make_batches(size, batch_size)
368 A list of tuples of array indices.
369 """
--> 370 num_batches = int(np.ceil(size / float(batch_size)))
371 return [(i * batch_size, min(size, (i + 1) * batch_size))
372 for i in range(0, num_batches)]
AttributeError: 'Dimension' object has no attribute 'ceil'
Which makes sense, since Keras expects only NumPy-like arrays and lists of such.
Having said that, I also tried pyCUDA and cupy arrays, since they say to be NumPy-like... but those produce the following errors:
keras/engine/training.pyc in predict(self, x, batch_size, verbose)
1515 f = self.predict_function
1516 return self._predict_loop(f, ins,
-> 1517 batch_size=batch_size, verbose=verbose)
1518
1519 def train_on_batch(self, x, y,
keras/engine/training.pyc in _predict_loop(self, f, ins, batch_size, verbose)
1139 ins_batch = _slice_arrays(ins, batch_ids)
1140
-> 1141 batch_outs = f(ins_batch)
1142 if not isinstance(batch_outs, list):
1143 batch_outs = [batch_outs]
keras/backend/tensorflow_backend.pyc in __call__(self, inputs)
2266 updated = session.run(self.outputs + [self.updates_op],
2267 feed_dict=feed_dict,
-> 2268 **self.session_kwargs)
2269 return updated[:len(self.outputs)]
2270
tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata)
893 try:
894 result = self._run(None, fetches, feed_dict, options_ptr,
--> 895 run_metadata_ptr)
896 if run_metadata:
897 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
1091 feed_handles[subfeed_t] = subfeed_val
1092 else:
-> 1093 np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
1094
1095 if (not is_tensor_handle_feed and
numpy/core/numeric.pyc in asarray(a, dtype, order)
529
530 """
--> 531 return array(a, dtype, copy=False, order=order)
532
533
ValueError: object __array__ method not producing an array
I tried googling this issue, but the only reasonable match is some Chinese blog post, which basically suggests patching Keras, which is impractical obviously.
I wonder what is the correct way to preload the whole dataset on GPU for Keras.
Useful info: I am using Keras 2.0.6 with TF 1.3, upgrading to 2.0.8/1.4 stack is yet unavailable due to crucial API changes, but would definitely be sped up in case it solves this issue.