0

I'm trying to get at the Common Crawl news S3 bucket, but I keep getting a "fatal error: Unable to locate credentials" message. Any suggestions for how to get around this? As far as I was aware Common Crawl doesn't even require credentials?

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
Jen
  • 21
  • 4

1 Answers1

2

From News Dataset Available – Common Crawl:

You can access the data even without a AWS account by adding the command-line option --no-sign-request.

I tested this by launching a new Amazon EC2 instance (without an IAM role) and issuing the command:

aws s3 ls s3://commoncrawl/crawl-data/CC-NEWS/

It gave me the error: Unable to locate credentials

I then ran it with the additional parameter:

aws s3 ls s3://commoncrawl/crawl-data/CC-NEWS/ --no-sign-request

It successfully listed the directories.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470