1

We are using Amazon EMR and commoncrawl to perform crawling. EMR writes the output to Amazon S3 in a binary-like format. We'd like to copy that to our local in raw-text format.

How can we achieve that? What's the best way?

Normally we could hadoop copyToLocal but we can't access hadoop directly and the data is on S3.

aladagemre
  • 592
  • 5
  • 16

0 Answers0