0

I'm training an object detection model (EfficientDet-Lite) using Tensorflow Lite Model Maker in Colab and I'd like to use a Cloud TPU. I have all the images in a GCS bucket and provide a CSV file. When I call object_detector.create I get the following error:

/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py in shape(self)
   1196         # `_tensor_shape` is declared and defined in the definition of
   1197         # `EagerTensor`, in C.
-> 1198         self._tensor_shape = tensor_shape.TensorShape(self._shape_tuple())
   1199       except core._NotOkStatusException as e:
   1200         six.raise_from(core._status_to_exception(e.code, e.message), None)

InvalidArgumentError: Unsuccessful TensorSliceReader constructor: Failed to get matching files on /tmp/tfhub_modules/db7544dcac01f8894d77bea9d2ae3c41ba90574c/variables/variables: Unimplemented: File system scheme '[local]' not implemented (file: '/tmp/tfhub_modules/db7544dcac01f8894d77bea9d2ae3c41ba90574c/variables/variables')

That looks like it's trying to process some local files in the CloudTPU, which doesn't work...

The gist of what I'm doing is:

tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
train_data, validation_data, test_data = object_detector.DataLoader.from_csv(
    drive_dir + csv_name,
    images_dir = "images" if not tpu else None,
    cache_dir = drive_dir + "cub_cache",
)
spec = MODEL_SPEC(tflite_max_detections=10, strategy='tpu', tpu=tpu.master(), gcp_project="xxx")
model = object_detector.create(train_data=train_data, 
                               model_spec=spec, 
                               validation_data=validation_data, 
                               epochs=epochs, 
                               batch_size=batch_size,
                               train_whole_model=True)

I can't find any example with Model Maker that uses Cloud TPU.

Edit: the error seems to occur when the EfficientDet model gets loaded, so somehow modelmaker must be pointing to a local file that doesn't work for CloudTPU?

TvE
  • 1,016
  • 1
  • 11
  • 19
  • Your problem is very common when you try to use local system to load dataset. Read this article to find out a possible solution for the file system scheme 'local'.. It is not with model maker but it will give you some hints. https://farmaker47.medium.com/fine-tune-a-bert-model-with-the-use-of-colab-tpu-34cf29067357 – Farmaker Jul 26 '21 at 17:53
  • 1
    Thanks for the link, but there's nothing really there that I'm not doing or that just doesn't apply to model maker... – TvE Jul 26 '21 at 18:15
  • I've opened a github issue @tensorflow: https://github.com/tensorflow/tensorflow/issues/50965, I can repro the problem with a minimally modified stock tutorial in Colab: https://gist.github.com/tve/615f4b51fa88dc643358176c86d6537e – TvE Jul 27 '21 at 00:58
  • Nice! I will follow as I am interested – Farmaker Jul 27 '21 at 03:53

2 Answers2

1

Yeah the error is happening with TFHub, which seems to be well known. Basically TF Hub loading tries to use a local cache which TPU doesn't have access to (and the Colab doesn't even provide). Check out https://github.com/tensorflow/hub/issues/604 which should get you past this error.

Allen Wang
  • 281
  • 1
  • 4
  • Thanks much for the pointer! I'm not sure what exactly I have to do to get ModelMaker to play with this, but I guess I'll have to play around a bunch... – TvE Jul 27 '21 at 20:22
0
  1. Download from TF-Hub the model you would like to train (replace X: 0<=X<=4): https://tfhub.dev/tensorflow/efficientdet/liteX/feature-vector/1
  2. Extract the package twice until you get to the "keras_metadata.pb", "saved_model.pb" and "variables" folder
  3. Upload these files and folders on a Google Cloud Bucket
  4. Pass the uri argument to model_spec.get (https://www.tensorflow.org/lite/tutorials/model_maker_object_detection), pointing to the Cloud Bucket folder (in gs:// format)
balu
  • 11
  • 1