Unable to upload huge file on google drive using python

Question

I am trying to upload files to Google drive by Google API using the following code

import httplib2
from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools
try:
    import argparse
    flags = argparse.ArgumentParser(parents=[tools.argparser]).parse_args()
except ImportError:
    flags = None

SCOPES =['https://www.googleapis.com/auth/drive','https://www.googleapis.com/auth/drive.file','https://www.googleapis.com/auth/drive.appdata', 'https://www.googleapis.com/auth/drive.apps.readonly']
store = file.Storage('scope.json')
creds = store.get()
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
    creds = tools.run_flow(flow, store, flags) if flags else tools.run(flow, store)
    DRIVE = discovery.build('drive', 'v3', http=creds.authorize(Http()))
else:
    credentials = creds
    http = credentials.authorize(httplib2.Http())
    DRIVE = discovery.build('drive', 'v3', http=http)

FILES = (
    ('/home/vkm/mayur/Demo_Google_API.zip', 'application/vmd.google-apps.document'),
)

for filename, mimeType in FILES:
    metadata = {'name': filename}
    if mimeType:
        metadata['mimeType'] = mimeType
    res = DRIVE.files().create(body=metadata, media_body=filename).execute()
    if res:
        print('Uploaded "%s" (%s)' % (filename, res['mimeType']))

I am able to upload the small files but when I am trying with 8GB of the file, it is giving MemorryErro.Please find the error message that I am getting.

Traceback (most recent call last):
  File "demo.py", line 46, in <module>
    res = DRIVE.files().create(body=metadata, media_body=filename).execute()
  File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery.py", line 853, in method
    payload = media_upload.getbytes(0, media_upload.size())
  File "/usr/local/lib/python2.7/dist-packages/googleapiclient/http.py", line 482, in getbytes
    return self._fd.read(length)
MemoryError

Your machine is running out of memory. Increase the available memory (if only it was that easy). Consider file compression. Or split the file up into multiple files and send each separately. — mhawke, Feb 18 '18 at 10:42
@mhawke thanks for the reply.. is there any other way to do this.... — Vikram Singh Chandel, Feb 18 '18 at 10:53

score 9 · Answer 1 · answered Mar 26 '18 at 02:27

Vikram's comment revealed a problem in mhawke's answer: next_chunk needs to be called upon the return value of:

request = DRIVE.files().create(body=metadata, media_body=media)

not on the return value of request.execute().

Here is a snippet of Python code I verified as working on files up to 10MB to my Google Drive account:

# Upload some file that just happens to be binary (we
# don't care about metadata, just upload it without
# translation):
the_file_to_upload = 'some_binary_file'
metadata = {'name': the_file_to_upload}
# Note the chunksize restrictions given in
# https://developers.google.com/api-client-library/python/guide/media_upload
media = MediaFileUpload(the_file_to_upload,
                        chunksize=1024 * 1024,
                        # Not sure whether or not this mimetypes is necessary:
                        mimetype='text/plain',
                        resumable=True)
request = drive_service.files().create(body=metadata, media_body=media)
response = None
while response is None:
    status, response = request.next_chunk()
    if status:
        print("Uploaded %d%%." % int(status.progress() * 100))
print("Upload of {} is complete.".format(the_file_to_upload))

Here is a snippet of Python code that downloads the same file, but to a different file, so that I can use sha1sum to verify file has not been altered by Google Drive going in and out.

# Verify downloading works without translation:
request = drive_service.files().get_media(fileId=response['id'])
# Use io.FileIO. Refer to:
# https://google.github.io/google-api-python-client/docs/epy/googleapiclient.http.MediaIoBaseDownload-class.html
out_filename = the_file_to_upload + ".out"
fh = io.FileIO(out_filename, mode='wb')
downloader = MediaIoBaseDownload(fh, request, chunksize=1024 * 1024)
done = False
while done is False:
    status, done = downloader.next_chunk()
    if status:
        print("Download %d%%." % int(status.progress() * 100))
print("Download Complete!")

This appears to be partially working for me. It prints the progress until it goes to call `next_chunck` on the last chunk. Then it returns a 413. My best guess is that `MediaIoBaseUpload` (which I'm using instead of `MediaFileUpload`) implements the request chunking incorrectly. Either that or there is a newly imposed file size limit that isn't well documented. — Grant Robert Smith, Oct 31 '18 at 01:39
Good answer, this works for me. However, how do I get back the fields using this approach? For example, I want the id of the uploaded file, in a standard upload I would just specify `fields="id"` inside the `.create()` and run `.execute()` on that which would give it back. — KillerKode, Nov 26 '19 at 21:33
@KillerKode Not really sure where you would get the "id". It's been too long ago when I posted this, but I do see `response['id']` might be it. Even if that were the case, it might not help you if you hadn't first uploaded the file and had access to the `response` object. Maybe someone else would know how to obtain that info from some other API (e.g., from searching Drive for filenames and such). — bgoodr, Nov 27 '19 at 01:21
I thought so too but got an exception when trying it lol. Google documentation is never great. Thanks anyway, I don't really need the ID but wanted to put it into my reusable functions for the future. Main thing is my 100 MB uploads are working a treat now :). — KillerKode, Nov 27 '19 at 09:13

score 2 · Answer 2 · edited Oct 31 '18 at 07:55

2

You could upload the file using a resumable media upload. This will send the file in chunks and should not max out your memory, which I assume is happening because your client is trying to send the whole file at once.

To do this you need to pass a MediaFileUpload object to the create() method in which the resumable flag is set to True. Optionally you can also set the chunksize.

metadata = {'name': filename}
media = MediaFileUpload(filename, mimetype=mimetype, resumable=True)

request = DRIVE.files().create(body=metadata, media_body=media)
response = None
while response is None:
  status, response = request.next_chunk()
  if status:
    print "Uploaded %d%%." % int(status.progress() * 100)
print "Upload Complete!"

Try reducing the chunksize if needed.

edited Oct 31 '18 at 07:55

Grant Robert Smith

501
4
11

answered Feb 18 '18 at 11:41

mhawke

84,695
9
117
138

1

thanks for the ans but I am getting `AttributeError: 'dict' object has no attribute 'next_chunk'` error... – Vikram Singh Chandel Feb 27 '18 at 17:24
@VikramSinghChandel I ran into the very same error you did. I have concluded that either mhawke's answer was wrong to begin with, or that it was correct at one point in time but the API maintainers invalidated it. See my modification of his code in my answer at https://stackoverflow.com/a/49483101/257924 – bgoodr Mar 26 '18 at 02:28
This answer is correct if you remove the `execute()` call. – Grant Robert Smith Oct 31 '18 at 01:40

score 1 · Answer 3 · answered Aug 27 '21 at 02:41

1

The easiest way to upload large files to Google drive with python is just to add resumable=True

from googleapiclient.http import MediaFileUpload    
media = MediaFileUpload(filename, resumable=True)

answered Aug 27 '21 at 02:41

Alice Iceberg

31
2

Unable to upload huge file on google drive using python

3 Answers3

Linked