That seems to be a sensible approach in general for a large dataset.
From the TensorFlow docs: https://www.tensorflow.org/performance/performance_guide
Reading large numbers of small files significantly impacts I/O
performance. One approach to get maximum I/O throughput is to
preprocess input data into larger (~100MB) TFRecord files. For smaller
data sets (200MB-1GB), the best approach is often to load the entire
data set into memory. The document Downloading and converting to
TFRecord format includes information and scripts for creating
TFRecords and this script converts the CIFAR-10 data set into
TFRecords.
Whether this will improve training performance (as in speed) may depend on your setup. In particular for a local setup with a GPU (see Matan Hugi's answer). (I haven't done any performance test myself)
The preprocessing only needs to happen once and you could run it in the cloud if necessary. It is more likely a bottleneck when your GPU becomes faster, e.g. you run it via Google's ML Engine with a more powerful GPU (unless you have access to a faster GPU yourself) or I/O becomes slower (e.g. involves network).
In summary some advantages:
- preprocessing is only done once
- preprocessing can be run in the cloud
- reduces bottleneck (if there is any)
You have that additional step though.
In your case, 20x 28MB should easily fit into memory though.