Flexible Shapes not working with ONNX to MLModel conversion using coremltools 4

Question

I cannot get flexible shapes working with an ONNX model I am converting to a MLModel using coremltools 4.0. The source model is from PyTorch, but I cannot use the new unified conversion because coremltools at this time does not support a reflection_pad2d layer used in the model.

coremltools compiles the model without any warning or error and shows that flexible shapes are supported:

input {
  name: "input"
  type {
    imageType {
      width: 1024
      height: 1024
      colorSpace: BGR
      imageSizeRange {
        widthRange {
          lowerBound: 256
          upperBound: -1
        }
        heightRange {
          lowerBound: 256
          upperBound: -1
        }
      }
    }
  }
}
output {
  name: "output"
  type {
    imageType {
      width: 1024
      height: 1024
      colorSpace: RGB
      imageSizeRange {
        widthRange {
          lowerBound: 256
          upperBound: -1
        }
        heightRange {
          lowerBound: 256
          upperBound: -1
        }
      }
    }
  }
}

But running a prediction on the model fails with the message:

MyApp[5773:4974761] [espresso] [Espresso::handle_ex_plan] exception=Invalid X-dimension 1/814 status=-7
MyApp[5773:4974761] [coreml] Error binding image input buffer input: -7
MyApp[5773:4974761] [coreml] Failure in bindInputsAndOutputs.
prediction error: Error Domain=com.apple.CoreML Code=0 "Error binding image input buffer input." UserInfo={NSLocalizedDescription=Error binding image input buffer input.}

Enumerated shapes will function with the model, but that is not adequate without having like 10k+ enumerated shapes which just doesn't seem like a solution.

The model is a fully convolutional network, it does not appear to use any fixed shapes (see spec output), and it works with different shapes in PyTorch, so it seems like it must be possible to get flexible shapes working somehow.

I've tried using flexible input shapes using image input/output:

input_names=['input']
output_names=['output']
channels = 3
input_shape = ct.Shape(shape=(channels, ct.RangeDim(), ct.RangeDim()))
#also tried:
input_shape = ct.Shape(shape=(channels, ct.RangeDim(256, 4096), ct.RangeDim(256, 4096)))
#and:
input_shape = ct.Shape(shape=(channels, ct.RangeDim(256, -1), ct.RangeDim(256, -1)))

model_input = ct.TensorType(shape=input_shape)
mlmodel = convert('torch_model.onnx',
            [model_input], 
            image_input_names=input_names,
            image_output_names=output_names,
            ...
)

spec = mlmodel.get_spec()

#tried with and without adding flexible shapes
spec = add_flexible_shapes(spec)

def add_flexible_shapes(spec):
    img_size_ranges = flexible_shape_utils.NeuralNetworkImageSizeRange(height_range=(256, -1), width_range=(256, -1))
    #also tried:
    #img_size_ranges = flexible_shape_utils.NeuralNetworkImageSizeRange(height_range=(256, 4096), width_range=(256, 4096))
    flexible_shape_utils.update_image_size_range(spec, feature_name=input_names[0], size_range=img_size_ranges)
    flexible_shape_utils.update_image_size_range(spec, feature_name=output_names[0], size_range=img_size_ranges)
    return spec

I also tried first converting the model as a multiarray, then convert to image, then add flexible shapes.

 torch.onnx.export(torch_model, example_input, 'torch_model.onnx', input_names=input_names, output_names=output_names, verbose=True)
 mlmodel = ct.converters.onnx.convert(model='torch_model.onnx',
                                 ...
 spec = mlmodel.get_spec()

 input = spec.description.input[0]
 input.type.imageType.colorSpace = ft.ImageFeatureType.RGB
 input.type.imageType.height = 1024
 input.type.imageType.width = 1024

 output = spec.description.output[0]
 output.type.imageType.colorSpace = ft.ImageFeatureType.RGB
 output.type.imageType.height = 1024
 output.type.imageType.width = 1024
                                     
 spec = add_flexible_shapes(spec)

I've looked at all the layers in the spec, and I don't see any that use a fixed shape (other than the input/output layers):

specificationVersion: 4
description {
  input {
    name: "input"
    type {
      imageType {
        width: 1024
        height: 1024
        colorSpace: RGB
      }
    }
  }
  output {
    name: "output"
    type {
      imageType {
        width: 1024
        height: 1024
        colorSpace: RGB
      }
    }
  }
  metadata {
    userDefined {
      key: "com.github.apple.coremltools.source"
      value: "onnx==1.7.0"
    }
    userDefined {
      key: "com.github.apple.coremltools.version"
      value: "4.0"
    }
  }
}
neuralNetwork {
  layers {
    name: "Pad_0"
    input: "input"
    output: "63"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 4
          endEdgeSize: 4
        }
        borderAmounts {
          startEdgeSize: 4
          endEdgeSize: 4
        }
      }
    }
  }
  layers {
    name: "Conv_1"
    input: "63"
    output: "64"
    convolution {
      outputChannels: 16
      kernelChannels: 3
      nGroups: 1
      kernelSize: 9
      kernelSize: 9
      stride: 1
      stride: 1
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_2"
    input: "64"
    output: "65"
    batchnorm {
      channels: 16
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Relu_3"
    input: "65"
    output: "66"
    activation {
      ReLU {
      }
    }
  }
  layers {
    name: "Pad_4"
    input: "66"
    output: "67"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_5"
    input: "67"
    output: "68"
    convolution {
      outputChannels: 32
      kernelChannels: 16
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 2
      stride: 2
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_6"
    input: "68"
    output: "69"
    batchnorm {
      channels: 32
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Relu_7"
    input: "69"
    output: "70"
    activation {
      ReLU {
      }
    }
  }
  layers {
    name: "Pad_8"
    input: "70"
    output: "71"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_9"
    input: "71"
    output: "72"
    convolution {
      outputChannels: 64
      kernelChannels: 32
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 2
      stride: 2
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_10"
    input: "72"
    output: "73"
    batchnorm {
      channels: 64
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Relu_11"
    input: "73"
    output: "74"
    activation {
      ReLU {
      }
    }
  }
  layers {
    name: "Pad_12"
    input: "74"
    output: "75"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_13"
    input: "75"
    output: "76"
    convolution {
      outputChannels: 64
      kernelChannels: 64
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 1
      stride: 1
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_14"
    input: "76"
    output: "77"
    batchnorm {
      channels: 64
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Relu_15"
    input: "77"
    output: "78"
    activation {
      ReLU {
      }
    }
  }
  layers {
    name: "Pad_16"
    input: "78"
    output: "79"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_17"
    input: "79"
    output: "80"
    convolution {
      outputChannels: 64
      kernelChannels: 64
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 1
      stride: 1
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_18"
    input: "80"
    output: "81"
    batchnorm {
      channels: 64
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Add_19"
    input: "81"
    input: "74"
    output: "82"
    addBroadcastable {
    }
  }
  layers {
    name: "Pad_20"
    input: "82"
    output: "83"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_21"
    input: "83"
    output: "84"
    convolution {
      outputChannels: 64
      kernelChannels: 64
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 1
      stride: 1
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_22"
    input: "84"
    output: "85"
    batchnorm {
      channels: 64
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Relu_23"
    input: "85"
    output: "86"
    activation {
      ReLU {
      }
    }
  }
  layers {
    name: "Pad_24"
    input: "86"
    output: "87"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_25"
    input: "87"
    output: "88"
    convolution {
      outputChannels: 64
      kernelChannels: 64
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 1
      stride: 1
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_26"
    input: "88"
    output: "89"
    batchnorm {
      channels: 64
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Add_27"
    input: "89"
    input: "82"
    output: "90"
    addBroadcastable {
    }
  }
  layers {
    name: "Pad_28"
    input: "90"
    output: "91"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_29"
    input: "91"
    output: "92"
    convolution {
      outputChannels: 64
      kernelChannels: 64
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 1
      stride: 1
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_30"
    input: "92"
    output: "93"
    batchnorm {
      channels: 64
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Relu_31"
    input: "93"
    output: "94"
    activation {
      ReLU {
      }
    }
  }
  layers {
    name: "Pad_32"
    input: "94"
    output: "95"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_33"
    input: "95"
    output: "96"
    convolution {
      outputChannels: 64
      kernelChannels: 64
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 1
      stride: 1
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_34"
    input: "96"
    output: "97"
    batchnorm {
      channels: 64
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Add_35"
    input: "97"
    input: "90"
    output: "98"
    addBroadcastable {
    }
  }
  layers {
    name: "Pad_36"
    input: "98"
    output: "99"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_37"
    input: "99"
    output: "100"
    convolution {
      outputChannels: 64
      kernelChannels: 64
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 1
      stride: 1
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_38"
    input: "100"
    output: "101"
    batchnorm {
      channels: 64
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Relu_39"
    input: "101"
    output: "102"
    activation {
      ReLU {
      }
    }
  }
  layers {
    name: "Pad_40"
    input: "102"
    output: "103"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_41"
    input: "103"
    output: "104"
    convolution {
      outputChannels: 64
      kernelChannels: 64
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 1
      stride: 1
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_42"
    input: "104"
    output: "105"
    batchnorm {
      channels: 64
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Add_43"
    input: "105"
    input: "98"
    output: "106"
    addBroadcastable {
    }
  }
  layers {
    name: "Pad_44"
    input: "106"
    output: "107"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_45"
    input: "107"
    output: "108"
    convolution {
      outputChannels: 64
      kernelChannels: 64
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 1
      stride: 1
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_46"
    input: "108"
    output: "109"
    batchnorm {
      channels: 64
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Relu_47"
    input: "109"
    output: "110"
    activation {
      ReLU {
      }
    }
  }
  layers {
    name: "Pad_48"
    input: "110"
    output: "111"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_49"
    input: "111"
    output: "112"
    convolution {
      outputChannels: 64
      kernelChannels: 64
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 1
      stride: 1
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_50"
    input: "112"
    output: "113"
    batchnorm {
      channels: 64
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Add_51"
    input: "113"
    input: "106"
    output: "114"
    addBroadcastable {
    }
  }
  layers {
    name: "Upsample_52"
    input: "114"
    output: "123"
    upsample {
      scalingFactor: 4
      scalingFactor: 4
    }
  }
  layers {
    name: "Pad_53"
    input: "123"
    output: "124"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_54"
    input: "124"
    output: "125"
    convolution {
      outputChannels: 32
      kernelChannels: 64
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 2
      stride: 2
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_55"
    input: "125"
    output: "126"
    batchnorm {
      channels: 32
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Relu_56"
    input: "126"
    output: "127"
    activation {
      ReLU {
      }
    }
  }
  layers {
    name: "Upsample_57"
    input: "127"
    output: "136"
    upsample {
      scalingFactor: 4
      scalingFactor: 4
      mode: BILINEAR
    }
  }
  layers {
    name: "Pad_58"
    input: "136"
    output: "137"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
        borderAmounts {
          startEdgeSize: 1
          endEdgeSize: 1
        }
      }
    }
  }
  layers {
    name: "Conv_59"
    input: "137"
    output: "138"
    convolution {
      outputChannels: 16
      kernelChannels: 32
      nGroups: 1
      kernelSize: 3
      kernelSize: 3
      stride: 2
      stride: 2
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  layers {
    name: "InstanceNormalization_60"
    input: "138"
    output: "139"
    batchnorm {
      channels: 16
      computeMeanVar: true
      instanceNormalization: true
      epsilon: 9.999999747378752e-06
      gamma {
      }
      beta {
      }
    }
  }
  layers {
    name: "Relu_61"
    input: "139"
    output: "140"
    activation {
      ReLU {
      }
    }
  }
  layers {
    name: "Pad_62"
    input: "140"
    output: "141"
    padding {
      reflection {
      }
      paddingAmounts {
        borderAmounts {
          startEdgeSize: 4
          endEdgeSize: 4
        }
        borderAmounts {
          startEdgeSize: 4
          endEdgeSize: 4
        }
      }
    }
  }
  layers {
    name: "Conv_63"
    input: "141"
    output: "output"
    convolution {
      outputChannels: 3
      kernelChannels: 16
      nGroups: 1
      kernelSize: 9
      kernelSize: 9
      stride: 1
      stride: 1
      dilationFactor: 1
      dilationFactor: 1
      valid {
        paddingAmounts {
          borderAmounts {
          }
          borderAmounts {
          }
        }
      }
      hasBias: true
      weights {
      }
      bias {
      }
    }
  }
  arrayInputShapeMapping: EXACT_ARRAY_MAPPING
  imageInputShapeMapping: RANK4_IMAGE_MAPPING
}

I've seen similar reports in the past. It sounds like a bug in Core ML. If you want to get to the bottom of this, you could do a binary-search type thing where you remove the bottom half of the layers from the model and see if it works now. If not, remove half the remaining layers, etc. At some point you may find that the model works again, and you can identify the layer where it goes wrong. — Matthijs Hollemans, Nov 06 '20 at 10:16
Thanks for the tips! I was able to fully reproduce the issue using only two conv2d layers, so something with ONNX conversion must be fundamentally broken. I posted a bug report with fully reproducible instructions here: https://github.com/apple/coremltools/issues/988 — Jeshua Lacock, Nov 06 '20 at 22:41

Flexible Shapes not working with ONNX to MLModel conversion using coremltools 4

0 Answers0