Google colab TPU and reading from disc while traning

Question

I have 100k pics, and it doesn't fit into ram, so I need read it from disc while training.

dataset = tf.data.Dataset.from_tensor_slices(in_pics)
dataset = dataset.map(extract_fn)

def extract_fn(x):
    x = tf.read_file(x)
    x = tf.image.decode_jpeg(x, channels=3)
    x = tf.image.resize_images(x, [64, 64])
return x

But then I try to train, I get this error

File system scheme '[local]' not implemented (file: '/content/anime-faces/black_hair/danbooru_2629248_487b383a8a6e7cc0e004383300477d66.jpg')

Can I work around it somehow? Also tried with TFRecords API, get the same error.

score 2 · Answer 1 · edited Feb 04 '19 at 21:09

2

The Cloud TPU you use in this scenario is not colocated on the same VM where your python runs. Easiest is to stage your data on GCS and use a gs:// URI to point the TPU at it.

To optimize performance when using GCS add prefetch(AUTOTUNE) to your tf.data pipeline, and for small (<50GB) datasets use cache().

edited Feb 04 '19 at 21:09

michaelb

252
1
6

answered Dec 13 '18 at 02:05

Ami F

2,202
11
19

That's odd; GCS storage should be the fastest way to get data to a TPU. Perhaps try increasing the replication of your stored data? (E.g. global or multi regional storage instead of zonal) – Ami F Dec 14 '18 at 15:36
I will try, for now, I only compared colab vm ram vs google cloud storage. And second option 3 times slower. – had Dec 15 '18 at 01:09
Tested with multi-regional, its faster now, but still worse then colab RAM. – had Jan 17 '19 at 08:46

Google colab TPU and reading from disc while traning

1 Answers1

Linked