Exporting TFRecords training patches with Google Earth Engine (kernelSize issues)

Question

I've been using GEE to export some training patches from Sentinel-2 to be used in Python. I could make it work, by following the GEE guide https://developers.google.com/earth-engine/tfrecord, and using the Export.image.toDrive function and then I can parse the exported TFRecord file to reconstruct my tiles.

var image_export_options = {
  'patchDimensions': [366, 366],
  'maxFileSize': 104857600,
  // 'kernelSize': [366, 366],
  'compressed': true
}

Export.image.toDrive({
  image: clipped_img.select(bands.concat(['classes'])),
  description: 'PatchesExport',
  fileNamePrefix: 'Oros_1',
  scale: 10,
  folder: 'myExportFolder',
  fileFormat: 'TFRecord',
  region: export_area,
  formatOptions: image_export_options,  
})

However, when I try to specify the kernelSize in the formatOptions (that was supposed to "overlaps adjacent tiles by [kernelSize[0]/2, kernelSize[1]/2]", according to the guide) the files are exported but the '*mixer.json' doesn't reflect the increased number of patches and I am not able to iterate through the patches afterwards. The following command crashes the google colab session:

image_dataset = tf.data.TFRecordDataset(str(path/(file_prefix+'-00000.tfrecord.gz')), compression_type='GZIP')
first = next(iter(image_dataset))
first

The weird is that the problem happens only when I add the kernelSize to the formatOptions.

score 0 · Answer 1 · edited Jul 16 '20 at 16:48

After some time trying to overcome this issue, I realized a not well documented behavior when one uses the kernel size to export patches from GEE. Bundled with the exported TFRecord, there exists one xml file called mixer. It doesn't matter if we use:

'patchDimensions': [184, 184],
'kernelSize': [1, 1],  #default for no overlapping

or

'patchDimensions': [184, 184],
'kernelSize': [184, 184],  #half patch overlapping

The mixer file remains the same and no mention to the kernel/overlapping size:

{'patchDimensions': [184, 184],
 'patchesPerRow': 8,
 'projection': {'affine': {'doubleMatrix': [10.0,
    0.0,
    493460.0,
    0.0,
    -10.0,
    9313540.0]},
  'crs': 'EPSG:32724'},
 'totalPatches': 40}

In the second case, if we try to parse the patches using tf.io.parse_single_example(example_proto, image_features_dict), where image_features_dict equals something like:

{'B2': FixedLenFeature(shape=[184, 184], dtype=tf.float32, default_value=None),
 'B3': FixedLenFeature(shape=[184, 184], dtype=tf.float32, default_value=None),
 'B4': FixedLenFeature(shape=[184, 184], dtype=tf.float32, default_value=None)}

it will raise the error:

_FallbackException: This function does not handle the case of the path where all inputs are not already EagerTensors.
Can't parse serialized Example. [Op:ParseExampleV2]

Instead, to parse these records which have kernelSize > 1, we have to consider patchDimentions + kernelSize as the resulting patch size, even though the mixer.xml file says on contraty. In this example, our patchSize would be 368 (original patch size + kernelSize). Be aware that for odd kernel sizes, the number to be added to the original patch size is kernelSize - 1.

Exporting TFRecords training patches with Google Earth Engine (kernelSize issues)

1 Answers1