13

I'm trying to train a Unet model in Tensorflow 2.0 which takes as input an image and a segmentation mask, but I'm getting a ValueError : as_list() is not defined on an unknown TensorShape. The stack trace shows the problem occurs during _get_input_from_iterator(inputs):

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2_utils.py in _prepare_feed_values(model, inputs, mode)
    110     for inputs will always be wrapped in lists.
    111   """
--> 112   inputs, targets, sample_weights = _get_input_from_iterator(inputs)
    113 
    114   # When the inputs are dict, then we want to flatten it in the same order as

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2_utils.py in _get_input_from_iterator(iterator)
    147   # Validate that all the elements in x and y are of the same type and shape.
    148   dist_utils.validate_distributed_dataset_inputs(
--> 149       distribution_strategy_context.get_strategy(), x, y, sample_weights)
    150   return x, y, sample_weights
    151 

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/distribute/distributed_training_utils.py in validate_distributed_dataset_inputs(distribution_strategy, x, y, sample_weights)
    309 
    310   if y is not None:
--> 311     y_values_list = validate_per_replica_inputs(distribution_strategy, y)
    312   else:
    313     y_values_list = None

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/distribute/distributed_training_utils.py in validate_per_replica_inputs(distribution_strategy, x)
    354     if not context.executing_eagerly():
    355       # Validate that the shape and dtype of all the elements in x are the same.
--> 356       validate_all_tensor_shapes(x, x_values)
    357     validate_all_tensor_types(x, x_values)
    358 

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/distribute/distributed_training_utils.py in validate_all_tensor_shapes(x, x_values)
    371 def validate_all_tensor_shapes(x, x_values):
    372   # Validate that the shape of all the elements in x have the same shape
--> 373   x_shape = x_values[0].shape.as_list()
    374   for i in range(1, len(x_values)):
    375     if x_shape != x_values[i].shape.as_list():

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/tensor_shape.py in as_list(self)
   1169     """
   1170     if self._dims is None:
-> 1171       raise ValueError("as_list() is not defined on an unknown TensorShape.")
   1172     return [dim.value for dim in self._dims]
   1173

I've looked through a couple of other Stackoverflow posts (here and here) with this error but in my case I think the problem arises in the map function I pass to my DataSets. I call the process_path function defined below to the map function of tensorflow DataSet. This accepts a path to the image and constructs a path to the corresponding segmentation mask which is a numpy file. The (256 256) array in the numpy file is then converted to (256 256 10), using kerasUtil.to_categorical where the 10 channels represents each class. I used the check_shape function to confirm that the tensor shapes are correct but still when I call model.fit the shape cannot be derived.

# --------------------------------------------------------------------------------------
# DECODE A NUMPY .NPY FILE INTO THE REQUIRED FORMAT FOR TRAINING
# --------------------------------------------------------------------------------------
def decode_npy(npy):
  filename = npy.numpy()
  data = np.load(filename)
  data = kerasUtils.to_categorical(data, 10)
  return data

def check_shape(image, mask):
  print('shape of image: ', image.get_shape())
  print('shape of mask: ', mask.get_shape())
  return 0.0

# --------------------------------------------------------------------------------------
# DECODE AN IMAGE (PNG) FILE INTO THE REQUIRED FORMAT FOR TRAINING
# --------------------------------------------------------------------------------------
def decode_img(img):
  # convert the compressed string to a 3D uint8 tensor
  img = tf.image.decode_png(img, channels=3)
  # Use `convert_image_dtype` to convert to floats in the [0,1] range.
  return tf.image.convert_image_dtype(img, tf.float32)

# --------------------------------------------------------------------------------------
# PROCESS A FILE PATH FOR THE DATASET
# input - path to an image file
# output - an input image and output mask
# --------------------------------------------------------------------------------------
def process_path(filePath):
  parts = tf.strings.split(filePath, '/')
  fileName = parts[-1]
  parts = tf.strings.split(fileName, '.')
  prefix = tf.convert_to_tensor(convertedMaskDir, dtype=tf.string)
  suffix = tf.convert_to_tensor("-mask.npy", dtype=tf.string)
  maskFileName = tf.strings.join((parts[-2], suffix))
  maskPath = tf.strings.join((prefix, maskFileName), separator='/')

  # load the raw data from the file as a string
  img = tf.io.read_file(filePath)
  img = decode_img(img)
  mask = tf.py_function(decode_npy, [maskPath], tf.float32)

  return img, mask

# --------------------------------------------------------------------------------------
# CREATE A TRAINING and VALIDATION DATASETS
# --------------------------------------------------------------------------------------
trainSize = int(0.7 * DATASET_SIZE)
validSize = int(0.3 * DATASET_SIZE)

allDataSet = tf.data.Dataset.list_files(str(imageDir + "/*"))
# allDataSet = allDataSet.map(process_path, num_parallel_calls=AUTOTUNE)
# allDataSet = allDataSet.map(process_path)

trainDataSet = allDataSet.take(trainSize)
trainDataSet = trainDataSet.map(process_path).batch(64)
validDataSet = allDataSet.skip(trainSize)
validDataSet = validDataSet.map(process_path).batch(64)

...

# this code throws the error!
model_history = model.fit(trainDataSet, epochs=EPOCHS,
                          steps_per_epoch=stepsPerEpoch,
                          validation_steps=validationSteps,
                          validation_data=validDataSet,
                          callbacks=callbacks)
Nick ODell
  • 15,465
  • 3
  • 32
  • 66
CSharp
  • 1,396
  • 1
  • 18
  • 41

3 Answers3

11

I had the same problem as you with image and mask and solved it by setting both their shapes during the preprocessing function manually, in particular when calling a pyfunc during the tf.map.

def process_path(filePath):
  ...

  # load the raw data from the file as a string
  img = tf.io.read_file(filePath)
  img = decode_img(img)
  mask = tf.py_function(decode_npy, [maskPath], tf.float32)

  # TODO:
  img.set_shape([MANUALLY ENTER THIS])
  mask.set_shape([MANUALLY ENTER THIS])

  return img, mask
Elias
  • 186
  • 1
  • 4
  • I tried that at one point but I recall it didn't address the issue. Annoyingly I'm struggling to remember how I solved this problem now. Thankyou for the suggestion though! – CSharp Nov 27 '19 at 13:44
  • Can you remember how you solved it? Struggling with the same – WPMed Mar 04 '20 at 10:53
  • I'm not sure if this was the actual problem, but when I first ran this code it was on Google Colab and instead of including tensorflow like this: `%tensorflow_version 2.x`, I was using `pip install` which caused problems. Another thing is I removed the call to `tf.py_function(decode_npy, [maskPath], tf.float32)` as I found a way to execute the process_path function solely on the GPU with tensorflow functions. – CSharp Mar 16 '20 at 16:33
  • 3
    I am on TF 2.3.0-dev20200620. I have the same error, and I also use tf dataset setup with similar processing and transform, but without any tf.py_function. The thing is that I see this error if I used loss='binary_crossentropy', metrics=['accuracy'] but not loss=SparseCategoricalCrossentropy(), metrics=['sparse_categorical_accuracy'] (i changed my model output according to test both kind of loss/metrics – kawingkelvin Jun 25 '20 at 17:24
  • 4
    Follow on: if i replaced the metrics to BinaryAccuracy instead of accuracy in the .fit(...), then there's also no such error. I suspect I may be hitting a real bug – kawingkelvin Jun 25 '20 at 17:37
  • I added the set_shape statement in my py_function but it is still giving the same error.. I also tried adding reshape in place of set_shape but no avail.. Any idea how to solve it? – psj Mar 08 '21 at 11:56
  • I was getting a similar bug when using tf.one_hot to perform a one-hot encoding on the masks as part of a tf.data pipeline and found that forcing the image and mask sizes as above worked. – Patrice Carbonneau Jan 09 '23 at 11:54
2

I had a similar issue and used dataset.padded_batch with explicit padded_shapes solved my problem!

ilke444
  • 2,641
  • 1
  • 17
  • 31
Sarahgk
  • 21
  • 1
2

I was facing the same issue, also with a py_func mapping inside my pipeline.

Turns out that the issue was related to the compatibility between loss function and metrics. I was trying to use "Categorical Crossentropy" loss with "accuracy" metric.

Solved my problem by changing the metric to "categorical accuracy".

Luciano Dourado
  • 473
  • 4
  • 14