0

I am having broken pipe issues while transferring a Google drive directory into GCP Google Storage. Steps are as follows. First, a compute instance creates and attaches a disk. Then it scans and requests file by file all the subdirectories and files within the Google Drive root directory. It does this using standard Google libraries google-api-client, google-cloud-storage and also this client library apiclient. Eventually dowloaded data is zipped and uploaded to Google Storage. Download takes some time, a couple of hours, eventually crashing with a [Errno 32] Broken pipe, every time at a different file, about one or two hours into the step. To fix and debug this I am catching the errors with try except BrokenPipeError statements across the script. If I do so, then it crashes with a different error, [Errno 16] Device or resource busy: '/data/'. For reference, /data/ is the name of the mounted attached disk mentioned above. This error, like the previous one, happens at a different file every time, which is very confusing. My understanding is that these two errors are somehow related although I do not fully understand how. Here is a sketch of how this works:

def download_files(service, item, dfilespath):
    if os.path.isfile(dfilespath + "/" + name):
        return
    if item['mimeType'] == 'application/vnd.google-apps.shortcut':
        logger.log('Not supported')
        return
    elif item['mimeType'] in list(MIMETYPE_CONVERSOR.keys()):
        # exported content is limited to 10MB
        request = service.files().export_media(fileId=item['id'], mimeType=MIMETYPE_CONVERSOR[item['mimeType']])
    else:
        request = service.files().get_media(fileId=item['id'])
        fh = io.BytesIO()
        suspicious_item = item['name']
        logger.log(f'io.BytesIO() failed for {suspicious_item}')
        downloader = MediaIoBaseDownload(fh, request)
    done = False
    try:
        while done is False:
            status, done = downloader.next_chunk()
            logger.log("Download %d%%." % int(status.progress() * 100))
    except Exception:
        logger.log('Failed to download file ' + name)
    with io.open(dfilespath + "/" + name, 'wb') as f:
        fh.seek(0)
        f.write(fh.read())

Increasing disk space or type does not fix the issue. I do not know if this arises because of I/O problems with the disk or because of the connection itself to Google Drive. I am inclined to say this seems more like an I/O problem as it says busy resource, but this is just a conjecture. The largest files it has to download are about 1-3 GBs. But again, this error seems to happen at different files every time and not necessarily at large ones (at least that I can tell from logs).

Bardigan
  • 23
  • 4
  • The ```[Errno 32] Broken pipe``` error may be caused by a failure in network connection (which might happen at some point if the program takes a lot of time). Probably a simple application would do the job in this case. I would also put a ```time.sleep()``` between retries. – Oriol Castander Jul 12 '22 at 10:05
  • That makes sense. When not catching broken pipe errors download logs (new log for a new file that starts download) do stop for about 20-30 mins, then crashes with this error. But then even if I catch these I still get the resource busy error, which is different. How do I solve a resource busy error? – Bardigan Jul 12 '22 at 10:54
  • That is because you are trying to write into the file (last line) **after** the try / except logic. Move the with statement into the try clause, that way it will only attempt to write into the file if it has not failed. – Oriol Castander Jul 20 '22 at 12:43

0 Answers0