I'm trying to run a simple sequential Tensorflow model with the Keras wrapper on an AMD GPU (AMD Vega 20, Tensorflow 2.2.0, Keras 2.4.3), but coming up against a weird issue when trying to fit:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 15 values, but the requested shape has 15976860750
It seems to be taking the batch size as the number of values for the input tensor, and somehow the size of the "requested shape" explodes. The model definition is as follows:
def create_model(optimizer='rmsprop', init='glorot_uniform'):
# create model
model = Sequential()
model.add(Dense(12,input_dim=8, kernel_initializer=init, activation='relu'))
model.add(Dense(8, kernel_initializer=init, activation='relu'))
model.add(Dense(1, kernel_initializer=init, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy',optimizer=optimizer,metrics=['accuracy'])
return model
modelClf = KerasClassifier(build_fn=create_model, verbose=1, batch_size=15, epochs=9)
The exact same model works fine if I run it on just CPU, on a machine that has no GPU installed. It also works well with the CUDA11 implementation on another machine running an NVidia GPU (with Tensorflow 1.15.3 and Keras 2.3.1).
I have no idea why it would be requesting the GPU memory size as the input size on this later Tensorflow version, and only if an AMD GPU is present. Is there something obvious that I might be getting wrong with the configuration here?
EDIT: In response to comments below, after some tweaking the "requested size" is somehow related to the batch size and not the GPU memory as thought (the number was apparently a coincidence - setting the batch size to 10 gives a "requested size" of 1092616192 instead). The input is just a simple panda dataframe with 8 values in each row (as defined with the input_dim, and as mentioned this works fine with the same implementation on other machines).
The error occurs during the call to fit() for the training - I can see from the output that it gets about 5 epochs in before crashing like this. The traceback is: (with "~/rocm/keras" just being the path to where i have the python packages installed for this environment)
File "~/rocm/keras/tensorflow/python/keras/wrappers/scikit_learn.py", line 223, in fit
return super(KerasClassifier, self).fit(x, y, **kwargs)
File "~/rocm/keras/tensorflow/python/keras/wrappers/scikit_learn.py", line 166, in fit
history = self.model.fit(x, y, **fit_args)
File "~/rocm/keras/tensorflow/python/keras/engine/training.py", line 66, in _method_wr apper
return method(self, *args, **kwargs)
File "~/rocm/keras/tensorflow/python/keras/engine/training.py", line 848, in fit
tmp_logs = train_function(iterator)
File "~/rocm/keras/tensorflow/python/eager/def_function.py", line 580, in __call__
result = self._call(*args, **kwds)
File "~/rocm/keras/tensorflow/python/eager/def_function.py", line 611, in _call
return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable
File "~/rocm/keras/tensorflow/python/eager/function.py", line 2420, in __call__
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "~/rocm/keras/tensorflow/python/eager/function.py", line 1665, in _filtered_call
self.captured_inputs)
File "~/rocm/keras/tensorflow/python/eager/function.py", line 1746, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "~/rocm/keras/tensorflow/python/eager/function.py", line 598, in call
ctx=ctx)
File "~/rocm/keras/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 10 values, but the requested shape has 1092616192