3

I'm trying to download a large folder with 50000 images from my GDrive into a local server using Python. The following code receives a limitation error. Any alternative solutions?

import gdown
url = 'https://drive.google.com/drive/folders/135hTTURfjn43fo4f?usp=sharing'  # I'm showing a fake token
gdown.download_folder(url)

Failed to retrieve folder contents:

The gdrive folder with url: https://drive.google.com/drive/folders/135hTTURfjn43fo4f?usp=sharing has at least 50 files, gdrive can't download more than this limit, if you are ok with this, please run again with --remaining-ok flag.

Vahid the Great
  • 393
  • 5
  • 18
  • 2
    I think it looks like you need to use something like this `gdown.download_folder(url, remaining_ok=True)`. – kite Oct 26 '21 at 13:42
  • 1
    This won't solve the problem. It will just download the first 50 files in the folder and ignore the rest! – Vahid the Great Oct 26 '21 at 17:12

4 Answers4

3

As what kite has mentioned in the comments, use it with the remaining_ok flag.

gdown.download_folder(url, remaining_ok=True)

This wasn't mentioned in https://pypi.org/project/gdown/ so there might be any confusion.

Any references on remaining_ok isn't available aside from the warning and this github code.

EDIT:

Seems like gdown is strictly limited to 50 files and haven't found a way of circumventing it.

If other than gdown is an option, then see code below.

Script:

import io
import os
import os.path
from googleapiclient.discovery import build
from googleapiclient.http import MediaIoBaseDownload
from google.oauth2 import service_account

credential_json = {
    ### Create a service account and use its the json content here ###
    ### https://cloud.google.com/docs/authentication/getting-started#creating_a_service_account
    ### credentials.json looks like this:
    "type": "service_account",
    "project_id": "*********",
    "private_key_id": "*********",
    "private_key": "-----BEGIN PRIVATE KEY-----\n*********\n-----END PRIVATE KEY-----\n",
    "client_email": "service-account@*********.iam.gserviceaccount.com",
    "client_id": "*********",
    "auth_uri": "https://accounts.google.com/o/oauth2/auth",
    "token_uri": "https://oauth2.googleapis.com/token",
    "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
    "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/service-account%40*********.iam.gserviceaccount.com"
}

credentials = service_account.Credentials.from_service_account_info(credential_json)
drive_service = build('drive', 'v3', credentials=credentials)

folderId = '### Google Drive Folder ID ###'
outputFolder = 'output'

# Create folder if not existing
if not os.path.isdir(outputFolder):
    os.mkdir(outputFolder)

items = []
pageToken = ""
while pageToken is not None:
    response = drive_service.files().list(q="'" + folderId + "' in parents", pageSize=1000, pageToken=pageToken,
                                          fields="nextPageToken, files(id, name)").execute()
    items.extend(response.get('files', []))
    pageToken = response.get('nextPageToken')

for file in items:
    file_id = file['id']
    file_name = file['name']
    request = drive_service.files().get_media(fileId=file_id)
    ### Saves all files under outputFolder
    fh = io.FileIO(outputFolder + '/' + file_name, 'wb')
    downloader = MediaIoBaseDownload(fh, request)
    done = False
    while done is False:
        status, done = downloader.next_chunk()
        print(f'{file_name} downloaded completely.')

References:

NightEye
  • 10,634
  • 2
  • 5
  • 24
  • This won't solve the problem. It will just download the first 50 files in the folder and ignore the rest! – Vahid the Great Oct 26 '21 at 16:29
  • Are you limited to a gdown solution or anything that works in python will do? as far as I checked, gdown is strictly limited to 50 files. @VahidGhafouri – NightEye Oct 26 '21 at 17:34
1

The download limit is set in ../gdown/download_folder.py

buddemat
  • 4,552
  • 14
  • 29
  • 49
Olin
  • 31
  • 3
  • Please elaborate your answer with additional details like how user can use the above information e.g. the changes required, build steps, may be a test run results to show if it really works etc. – Azhar Khan Feb 16 '23 at 02:57
  • This is the best answer. Just change the MAX_NUMBER_FILES = 50 to a large number and should be good after that. – realanswers Aug 23 '23 at 14:23
0

This is a workaround that I used to download urls using gdown

  • Go to the drive directory from which you need to download the files
  • select all the files using ctrl/cmd A. click on share + and copy all the links
  • Now use the following python script to do your job
import re
import os
urls = <copied_urls>
url_list = urls.split(', ')
pat = re.compile('https://drive.google.com/file/d/(.*)/view\?usp=sharing')
for url in url_list:
    g = re.match(pat,url)
    id = g.group(1)
    down_url = f'https://drive.google.com/uc?id={id}'
    os.system(f'gdown {down_url}')

Note: This solution isn't ideal for 50000 images as the copied urls string will be too huge. If your string is huge, copy it in a file and process it instead of using a variable. In my case I had to copy 75 large files

-2
!pip uninstall --yes gdown # After running this line, restart Colab runtime.
!pip install gdown -U --no-cache-dir
import gdown

url = r'https://drive.google.com/drive/folders/1sWD6urkwyZo8ZyZBJoJw40eKK0jDNEni'
gdown.download_folder(url)