I am having broken pipe issues while transferring a Google drive directory into GCP Google Storage. Steps are as follows. First, a compute instance creates and attaches a disk. Then it scans and requests file by file all the subdirectories and files within the Google Drive root directory. It does this using standard Google libraries google-api-client
, google-cloud-storage
and also this client library apiclient
. Eventually dowloaded data is zipped and uploaded to Google Storage. Download takes some time, a couple of hours, eventually crashing with a [Errno 32] Broken pipe
, every time at a different file, about one or two hours into the step. To fix and debug this I am catching the errors with try
except BrokenPipeError
statements across the script. If I do so, then it crashes with a different error, [Errno 16] Device or resource busy: '/data/'
. For reference, /data/
is the name of the mounted attached disk mentioned above. This error, like the previous one, happens at a different file every time, which is very confusing. My understanding is that these two errors are somehow related although I do not fully understand how. Here is a sketch of how this works:
def download_files(service, item, dfilespath):
if os.path.isfile(dfilespath + "/" + name):
return
if item['mimeType'] == 'application/vnd.google-apps.shortcut':
logger.log('Not supported')
return
elif item['mimeType'] in list(MIMETYPE_CONVERSOR.keys()):
# exported content is limited to 10MB
request = service.files().export_media(fileId=item['id'], mimeType=MIMETYPE_CONVERSOR[item['mimeType']])
else:
request = service.files().get_media(fileId=item['id'])
fh = io.BytesIO()
suspicious_item = item['name']
logger.log(f'io.BytesIO() failed for {suspicious_item}')
downloader = MediaIoBaseDownload(fh, request)
done = False
try:
while done is False:
status, done = downloader.next_chunk()
logger.log("Download %d%%." % int(status.progress() * 100))
except Exception:
logger.log('Failed to download file ' + name)
with io.open(dfilespath + "/" + name, 'wb') as f:
fh.seek(0)
f.write(fh.read())
Increasing disk space or type does not fix the issue. I do not know if this arises because of I/O problems with the disk or because of the connection itself to Google Drive. I am inclined to say this seems more like an I/O problem as it says busy resource, but this is just a conjecture. The largest files it has to download are about 1-3 GBs. But again, this error seems to happen at different files every time and not necessarily at large ones (at least that I can tell from logs).