0

I'm trying to download a small chunk of the YouTube-8M dataset. It is just a dataset with video features and labels and you can create your own model to classify them.

The command that they claim will download the dataset is this :

curl storage.googleapis.com/data.yt8m.org/download_fix.py | shard=1,100 partition=2/frame/train mirror=us python

This actually didn't worked at all and the error produced is :

'shard' is not recognized as an internal or external command,operable program or bash file.

I found someone posted on a forum. It says to add 'set' to the variables which seems to fix my problem partially.

curl storage.googleapis.com/data.yt8m.org/download_fix.py | set shard=1,100 partition=2/video/train mirror=us python

The download seemingly started for a split second and an error pop up. The error right now is (23) Failed writing body.

enter image description here

So what is the command line for downloading the dataset.

5Volts
  • 179
  • 3
  • 13

1 Answers1

0

I'd try using the Kaggle API instead. You can install the API using:

pip install Kaggle

Then download your credentials (step-by-step guide here). Finally, you can download the dataset like so:

kaggle competitions download -c youtube8m

If you only want part of the dataset, you can first list all the downloadable files:

kaggle competitions files -c youtube8m

And then only download the file(s) you want:

kaggle competitions download -c youtube8m -f name_of_your_file.extension

Hope that helps! :)

Rachael Tatman
  • 841
  • 7
  • 6