0

I am training a deep learning model using Python 3.9 / Pytorch 1.12.1 / numpy 1.23.1. My data is in NPZ format; the dataloader gets several items within NPZ arrays. One of them is an array of dimensions 192x256x1x12. After creating the data (always using np.savez), the model runs for several epochs, hence going several times through the same data without errors, and then errors out - always at different times - with:

  File "[...]/venv/lib/python3.9/site-packages/numpy/lib/npyio.py", line 245, in __getitem__
    return format.read_array(bytes,
  File "[...]/venv/lib/python3.9/site-packages/numpy/lib/format.py", line 777, in read_array
    data = _read_bytes(fp, read_size, "array data")
  File "[...]/venv/lib/python3.9/site-packages/numpy/lib/format.py", line 906, in _read_bytes
    r = fp.read(size - len(data))
  File "/usr/lib/python3.9/zipfile.py", line 924, in read
    data = self._read1(n)
  File "/usr/lib/python3.9/zipfile.py", line 1014, in _read1
    self._update_crc(data)
  File "/usr/lib/python3.9/zipfile.py", line 942, in _update_crc
    raise BadZipFile("Bad CRC-32 for file %r" % self.name)
zipfile.BadZipFile: Bad CRC-32 for file 'label.npy'

It seems the files are getting corrupted within the model, because it works several times (ie. 20, 30 etc.) without any issue, but I don't know why or where.

I need to use NPZ, so unfortunately I cannot switch to NPY format.

The data are created using THE SAME versions of Python, numpy, torch etc. than the ones the model runs with.

Is there any reason why this might be happening?

Thanks.

0 Answers0