0

I need to count the number of lines with a matching patterns across s3 buckets. The command I am using is -:

s3cmd ls --recursive s3://mys3.com/bucket1/ | awk '{print $4}' | grep '.lzo' | xargs -I@ s3cmd get @ - | zgrep 'my-pattern-of-interest-1' |  zgrep 'my-pattern-of-interest-2'|wc -l

but this still downloads the files physically, is there an external utility (with boto for example), where I can still do the same, but without downloading the file physically ? I need to scan thorough 4-5 months of data,so want to avoid download at all costs.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
ekta
  • 1,560
  • 3
  • 28
  • 57

1 Answers1

0

There really isn't any way to analyze the contents of objects in S3 without GET'ing the content of the objects. You could fire up an EC2 instance or two and do the processing there so you don't have to copy the data to your local machine. That would certainly be faster. Going forward, you might be able to use AWS Lambda to do the processing whenever new files are uploaded to the bucket. But I'm not aware of anyway to get Lambda to process all of the existing objects in S3.

garnaat
  • 44,310
  • 7
  • 123
  • 103