0

I am using for the first time aws for my data science projects so I am trying to download a dataset from aws data exchange. This is the link to the dataset that I am trying to download: https://aws.amazon.com/marketplace/pp/prodview-jyodtbskm2fuu?fbclid=IwAR0M_bNUg2_PQFADEfVeTe9joERy9SUA6BUlT_Tejjra6h2G6wfNEMpc5sk#overview

Unfortunately, my experience is quite limited and I do not have enough time to spend on it since I am not interested in aws itself.

I tried the next commands:

aws s3 ls --no-sign-request s3://greenwichhr-covidjobimpacts/

This gives me all the files:

2022-11-02 06:02:49     715891 geography.csv.part_00000
2022-11-02 06:02:49   17428149 geography_industry.csv.part_00000
2022-11-02 06:02:49     105428 ghr_data_specs_covid_public.pdf
2022-11-02 06:02:49     262557 industry.csv.part_00000
2022-11-02 06:02:49    7221882 industry_job_family.csv.part_00000
2022-11-02 06:02:49     357094 job_family.csv.part_00000
2022-11-02 06:02:49   31910685 job_family_role.csv.part_00000
2022-11-02 06:02:49      10233 overall.csv.part_00000

I tried to download one file using this command:

aws s3 cp s3://greenwichhr-covidjobimpacts/geography.csv.part_00000 --no-sign-request mylocalfile.csv

The next error is what I got:

warning: Skipping file s3://greenwichhr-covidjobimpacts/geography.csv.part_00000. Object is of storage class GLACIER. Unable to perform download operations on GLACIER objects. You must restore the object to be able to perform the operation. See aws s3 download help for additional parameter options to ignore or force these transfers.

Secondly, I tried this:

aws s3 cp s3://greenwichhr-covidjobimpacts/geography.csv.part_00000 --no-sign-request mylocalfile.csv --force-glacier-transfer

This is what I got:

download failed: s3://greenwichhr-covidjobimpacts/geography.csv.part_00000 to .\ceva.csv An error occurred (InvalidObjectState) when calling the GetObject operation: The operation is not valid for the object's storage class

After some time spending reading, I tried to restore the files using this command:

aws s3api restore-object --bucket greenwichhr-covidjobimpacts --key geography.csv.part_00000 --restore-request Days=25,GlacierJobParameters={"Tier"="Standard"} --no-sign-request
An error occurred (AccessDenied) when calling the RestoreObject operation: Access Denied

Do you know exactly what are the steps in order to download the files?

Fredi
  • 9
  • 2
  • 2
    It looks like that dataset has been archived and is no longer available for download. Only the owner of the S3 bucket can restore it (because a restoration would cost the owner money). You would have to reach out to the dataset owner https://www.greenwich.hr/ to find out why it has been archived, and how you could gain access. Unfortunately, in my experience a lot of the public datasets on AWS get published and then are not maintained at all, and stop working after a while. – Mark B Apr 30 '23 at 14:39

0 Answers0