3

I have been using AWK for this command but it is so slow.

There must be a faster way to process this list from an aws s3 ls command

s5cmd ls s3://bucket-name/* | awk -v AWS_BUCKET="bucket-name" '{cmd="aws s3api put-object-acl --access-control-policy file:///access_policy.json --bucket " AWS_BUCKET " --key "$5; system(cmd); print $5}'

This basically lists all objects then applies an ACL to them.

Any ideas?

  • 2
    IMHO, I am not an expert in `parallel` command(GNU utility) but you could make use of it may be by kicking off commands in parallel, you could add that tag too in this question if you are ok with it. – RavinderSingh13 Nov 07 '19 at 17:23
  • Why this could be slow is you are passing output from one command to another and then running another set of commands for each line of output of previous command is a child shell(since you are using system) – RavinderSingh13 Nov 07 '19 at 17:39
  • That is exactly why it is slow, any ideas for an alternative? – Joshua G. Edwards Nov 07 '19 at 17:53
  • 1
    I agree with Ravnider : I would delete the tag for amazon-web-services (or one of the others) and replace with gnu-parallel, that way you'll get people that specialize with this tool to look at your Q. This assumes your multiplie tasks will not be interfering with each other. Good luck. – shellter Nov 07 '19 at 19:25
  • 1
    And, this isn't awk's fault. Unless you have a programming language that can run/manager multiple programs in parallel, you're going to be time bound by executing each program in sequence, each waiting for the previous to finish. (you probably know that ;-) ). You could use GlennJackmans idea, but add a `&` at the end of each cmd string, then all will run in the background and in parallel, but it will not be managed well. Good luck. – shellter Nov 07 '19 at 19:27

2 Answers2

4

The answer for me was a combination of two answers

s5cmd ls s3://bucket-name/* |
  awk -v AWS_BUCKET="bucket-name" '{
    printf "aws s3api put-object-acl --access-control-policy file:///access_policy.json --bucket %s --key %s\n", AWS_BUCKET, $5
  }' |
  parallel -j 32

This significantly sped up this command, thanks, Glenn Jackman and Mark Setchell

  • 3
    Well done and thank you for sharing back with the Stack Overflow community. Spend 10 minutes on the **GNU Parallel** man-page and tutorials - well worth it in these days of multi-core CPUs. Also check `--progress`, `--bar` and `--eta` switches. – Mark Setchell Nov 08 '19 at 10:04
3

Instead of doing system(cmd) for each line, you might want to just print all the commands, then pipe the output into sh to execute them.

s5cmd ls s3://bucket-name/* |
  awk -v AWS_BUCKET="bucket-name" -v '{
    printf "aws s3api put-object-acl --access-control-policy file:///access_policy.json --bucket %s --key %s\n", AWS_BUCKET, $5
  }' |
  sh

And you have that stray -v there. I assume that's a typo or you removed something sensitive.

glenn jackman
  • 238,783
  • 38
  • 220
  • 352