The URL https://ml-cloud-dataset.s3.amazonaws.com/Airlines_data.txt
is saying:
- The bucket name is
ml-cloud-dataset
- There is an object called
Airlines_data.txt
Fortunately, it is a publicly accessible bucket, so you can list the contents with the AWS CLI:
$ aws s3 ls ml-cloud-dataset
2020-03-06 23:32:55 10237044 Airlines_data.txt
2020-03-06 23:33:15 84 dept
2020-03-06 23:33:15 218 employee
2020-03-06 23:33:15 1666 hive_key.cer
2020-03-06 23:33:15 22628 u.user
You can copy the object to your own bucket using:
aws s3 cp s3://ml-cloud-dataset/Airlines_data.txt s3://your-bucket/
To copy ALL the objects, use:
aws s3 sync s3://ml-cloud-dataset/ s3://your-bucket/
However, if you are using Hive within AWS you possibly don't even need to download the files -- you could just reference it directly using s3://ml-cloud-dataset/Airlines_data.txt
.
You could also access it from Amazon Athena using that same path.