6

I want to download the sign language dataset from Kaggle to my Colab.

So far I always used wget and the specific zip file link, for example:

!wget --no-check-certificate \
    https://storage.googleapis.com/laurencemoroney-blog.appspot.com/rps.zip \
    -O /tmp/rps.zip

However, when I right-click the download button at Kaggle and select copy link to get the path copied to my clipboard and I output it I get:

https://www.kaggle.com/datamunge/sign-language-mnist/download

When I use this link in my browser I am asked to download it. I can see that the filename is 3258_5337_bundle_archive.zip

So I tried:

!wget --no-check-certificate \
        https://www.kaggle.com/datamunge/sign-language-mnist/download3258_5337_bundle_archive.zip  \
        -O /tmp/kds.zip

and also tried:

 !wget --no-check-certificate \
            https://www.kaggle.com/datamunge/sign-language-mnist/download3258_5337_bundle_archive.zip  \
            -O /tmp/kds.zip

I get as output:

exa

So it does not work. File coudln't be found or the returned zip archive is not 101mb large, but just a few kb. Also when trying to unzip it, it does not work.

How can I download this file into my colab (directly with wget?)?

rchurt
  • 1,395
  • 1
  • 10
  • 21
Stat Tistician
  • 813
  • 5
  • 17
  • 45
  • Just to clarify, do you think this is a Colab-specific problem, or would you have the same problem if you tried to do the same locally? – rchurt Jul 01 '20 at 14:50
  • @rchurt I don't know, I only used Colab. – Stat Tistician Jul 01 '20 at 21:03
  • If you instead need just some chosen file of the dataset, and not the whole dataset file, see [How to load just one chosen file of a way too large Kaggle dataset from Kaggle into Colab](https://stackoverflow.com/questions/67713193/how-to-load-just-one-chosen-file-of-a-way-too-large-kaggle-dataset-from-kaggle-i). – questionto42 Jun 01 '21 at 13:34

2 Answers2

9

Kaggle recommends using their own API instead of wget or rsync.

First, make an API token for Kaggle. On Kaggle's website go to "My Account", Scroll to API section and click on "Create New API Token" - It will download kaggle.json file on your machine.

Then run the following in Google Colab:

from google.colab import files
files.upload() # Browse for the kaggle.json file that you downloaded

# Make directory named kaggle, copy kaggle.json file there, and change the permissions of the file.
! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/
! chmod 600 ~/.kaggle/kaggle.json

# You can check if everything's okay by running this command.
! kaggle datasets list

# Download and unzip sign-language-mnist dataset into '/usr/local'
! kaggle datasets download -d datamunge/sign-language-mnist --path '/usr/local' --unzip

Used info from here: https://www.kaggle.com/general/74235

rchurt
  • 1,395
  • 1
  • 10
  • 21
1

This is the simplest way I came up to do it (if you participate in competition just change datasets to competitions):

import os

os.environ['KAGGLE_USERNAME'] = "xxxx"

os.environ['KAGGLE_KEY'] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

!kaggle datasets download -d iarunava/happy-house-dataset
Vega
  • 27,856
  • 27
  • 95
  • 103
Seb.code
  • 157
  • 5