0

Here is the website link I am gathering the data from:

https://www.kaggle.com/c/talkingdata-adtracking-fraud-detection/data

Essentially, I'd like to gather the train dataset and read it directly into my data science experience notebook since my local system can't handle the size. I'm able to use !wget to download the zip file but when I try to use unzip it just gives the following message:

Archive:  train.csv.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of train.csv.zip or
        train.csv.zip.zip, and cannot find train.csv.zip.ZIP, period.

Here are the contents within my directory:

a_hv9j8u_anything.log  model.h5.base64  watsoniotp.broken.pickle
data               rklib.py     watsoniotp.healthy.pickle
MNIST_data         rklib.pyc
model.h5           train.csv.zip

Any help would be much appreciated.

madsthaks
  • 2,091
  • 6
  • 25
  • 46

1 Answers1

0

I assume you are doing

!wget https://www.kaggle.com/c/8540/download/test_supplement.csv.zip

once the file is downloaded you will see that the file size is just 8KB.

!ls -l test_supplement.csv.zip

The downloaded file is indeed not a valid zip file , it is rather a html file that is presented you to login to Kaggle. !cat test_supplement.csv.zip will html content.

Kaggle datasets are downloadable after you are authenticated so wget or curl will not work without auth.

Options you have, simply download the dataset from webpage after you are authenticated and upload it to whatever system you are trying and use it.(Please note kaggle's policy on use of this datasets before distribution).

or

Try using https://github.com/Kaggle/kaggle-api

Here is the notebook where i have shown , how to install and use api mentioned at above link.

Thanks, Charles.

charles gomes
  • 2,145
  • 10
  • 15