0

I'm using another developer's Colab notebook and I previously ran this code block successfully. But now when I try, the download doesn't complete and it stops at roughly 1.9K, when it should be more like 1.1 GB. Is there a small change to the code I can make to prevent this truncation?

Code:

#@title
experiment_type = 'ffhq_encode'
def get_download_model_command(file_id, file_name):
    """ Get wget download command for downloading the desired model and save to directory pretrained_models. """
    current_directory = os.getcwd()
    save_path = os.path.join(os.path.dirname(current_directory), CODE_DIR, "pretrained_models")
    if not os.path.exists(save_path):
        os.makedirs(save_path)
    url = r"""wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id={FILE_ID}' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id={FILE_ID}" -O {SAVE_PATH}/{FILE_NAME} && rm -rf /tmp/cookies.txt""".format(FILE_ID=file_id, FILE_NAME=file_name, SAVE_PATH=save_path)
    return url    

MODEL_PATHS = {
    "ffhq_encode": {"id": "1cUv_reLE6k3604or78EranS7XzuVMWeO", "name": "e4e_ffhq_encode.pt"},
    "cars_encode": {"id": "17faPqBce2m1AQeLCLHUVXaDfxMRU2QcV", "name": "e4e_cars_encode.pt"},
    "horse_encode": {"id": "1TkLLnuX86B_BMo2ocYD0kX9kWh53rUVX", "name": "e4e_horse_encode.pt"},
    "church_encode": {"id": "1-L0ZdnQLwtdy6-A_Ccgq5uNJGTqE7qBa", "name": "e4e_church_encode.pt"}
}

path = MODEL_PATHS[experiment_type]
download_command = get_download_model_command(file_id=path["id"], file_name=path["name"]) 

!wget {download_command}

Output:

--2022-04-28 21:36:10--  http://wget/
Resolving wget (wget)... failed: Name or service not known.
wget: unable to resolve host address ‘wget’
--2022-04-28 21:36:10--  https://docs.google.com/uc?export=download&confirm=&id=1cUv_reLE6k3604or78EranS7XzuVMWeO
Resolving docs.google.com (docs.google.com)... 172.253.63.113, 172.253.63.100, 172.253.63.138, ...
Connecting to docs.google.com (docs.google.com)|172.253.63.113|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘/content/encoder4editing/pretrained_models/e4e_ffhq_encode.pt’

/content/encoder4ed     [ <=>                ]   1.95K  --.-KB/s    in 0s      

2022-04-28 21:36:10 (30.7 MB/s) - ‘/content/encoder4editing/pretrained_models/e4e_ffhq_encode.pt’ saved [1993]

FINISHED --2022-04-28 21:36:10--
Total wall clock time: 0.1s
Downloaded: 1 files, 1.9K in 0s (30.7 MB/s)

What the actual download should look like (notice the file size at the bottom): enter image description here

Azurespot
  • 3,066
  • 3
  • 45
  • 73
  • The file you are downloading from Google Drive is your or another person? – raspiduino Apr 29 '22 at 05:09
  • @raspiduino I got the code base notebook from someone else, but I made a copy of it to my own Drive first before executing the copy. – Azurespot Apr 30 '22 at 22:11
  • @raspiduino also... the original author posted a link to the full file which I downloaded, so I tried to replace the downloaded file (that's only 1.2K) with the full file and tried to skip the code block shown above, but then got some other error. So I wondered if that code block does something else besides just download a file. Here's the original code base. https://github.com/bycloudai/StyleCLIP-e4e-colab – Azurespot Apr 30 '22 at 22:31

1 Answers1

0

Note that the line

Length: unspecified [text/html]

means the file wget is downloading is HTML file, which means it's an error page returned by Google. You can open that html file to see the error.

Using wget and Google Drive cookie trick to bypass the large file confirmation sometime is not a good idea. Sometimes Google might return the error in HTML file instead of your file, and the HTML file is usually much smaller than the file you expected.

Google Colab has it build-in module for accessing file from Google Drive. See this. Basically, just create a code cell, then paste and run the following code:

from google.colab import drive
drive.mount('/content/drive')

After running it, Google Colab will require you to sign in to your drive. After it's done with no error, your file will be in /content/drive/My Drive. You can use cp or mv to move the file or whatever you want. You can also change the mount path in drive.mount method above.

If it's other's file, you can create a copy of the file to your Google Drive, then access it.

You can also access / view / modify / upload your files from the toolbar on the left: file toolbar

Or you can also use gdrive to download your file. It can also download from a shared id, which is somewhat suitable for your code. Use os.system if you want to do that.

raspiduino
  • 601
  • 7
  • 16
  • Thanks, I can't wait to try this. But are you saying, all I have to do is mount my drive? Then using the `wget` command will work? You mentioned Google cookies. I don't know what is in that command (wasn't my code and not experienced with writing commands like that), but after I mount my drive, should I still run that same `wget` command or try something different? – Azurespot Apr 30 '22 at 22:10
  • No, if you mounted Google Drive using the command above, you can just browse your Drive file in `/content/drive/My Drive`. Use `os.chdir` to change to that directory. Also note that every change you made to the files in the mounted directory will be synced to your Google Drive. For example, if you create a file in that directory, a new file will appeared in Google Drive – raspiduino May 01 '22 at 15:43
  • 1
    Also, please avoid using `wget` to download file from Google Drive since it cannot handle the errors that Drive returned – raspiduino May 01 '22 at 15:45
  • 1
    Thanks, I will try this soon. I did visit the website as you suggested and I see a message about how it won't scan for viruses and excepts a click on the download button. Yeah, `wget` won't handle that one. – Azurespot May 02 '22 at 18:49
  • It can actually handle that sort of thing, but require more confirmation code and the worst part is Google dynamically change the pages's code over time, so it will be much more difficult – raspiduino May 03 '22 at 09:30