0

I have a cronjob that executes every minute that uses awscli s3 sync command for syncing my website with a S3 bucket.

It seems the command sometimes run for a very long time for no apparent reason :

20613 bitnami   20   0  191876  48668   9756 R 30.3  2.4   1:22.43 /usr/bin/python3 /home/bitnami/.local/bin/aws s3 sync --delete /opt/bitnami/apps/wordpress/htdocs s3://nutriti-code

In this example, the files are already synced as there is nothing new on the source to transfer to the S3 bucket.

I cannot see any log in the /var/log/syslog except the confirmation that the command has executed successfully

Also my files are synced correctly.

Why would the command run for as long as 1m22s and more if there is nothing to sync ?

I am wondering why the command take so much time because there is nothing to sync and an S3 object will be downloaded if the size of the S3 object differs from the size of the local file, the last modified time of the S3 object is newer than the last modified time of the local file, or the S3 object does not exist in the local directory. The last modified time of the local file is changed to the last modified time of the S3 object.

wlarcheveque
  • 131
  • 1
  • 2
  • 6
  • 2
    How can it determine if there is nothing new to sync? – Michael Hampton Mar 13 '19 at 18:22
  • Increasing parallelism may help (`aws configure set default.s3.max_concurrent_requests 50`) but if there are a lot of files (thousands/millions) it's always going to take a while to figure out *what* to sync. – ceejayoz Mar 13 '19 at 18:22
  • 1
    How many files are being sync'd? Are you using an S3 gateway, which could reduce latency? Why do you need your website sync'd every minute? If you're trying for a low RPO there are other ways of doing it. What risks or threats are you trying to protect your website from with sync's this often? I think a different approach is needed. – Tim Mar 13 '19 at 18:29
  • @MichaelHampton Does't `sync` command only sync the difference between source and target comparing timestamps ? Maybe I did not understand correctly. @ceejayoz Thanks, I will look into this. There are not so many files (3000 max). @Tim I am not using S3 Gateway I will ook into this. I change the cronjob to 10 minutes. I am using a writer node to S3 bucket and the Instances created by my AutoScalingGroup are reading nodes that will do the opposing sync to get the application files. I want to make sure my Instances have up-to-date application files. You are right one-minute is overkill. – wlarcheveque Mar 13 '19 at 18:54
  • @ceejayoz I changed the max_concurrent_requests and my problem still persists. The sync command runs for a long time and multiple sync command run at once until the host because unhealthy for the ELB. – wlarcheveque Mar 14 '19 at 18:36
  • I also suppose it is not normal that I have 4-5 different `aws s3 sync` processes running the same command at the same time. – wlarcheveque Mar 14 '19 at 18:59

1 Answers1

1

I suggest you have User Data run a sync command when instances launch, as well as a command to force update of the operating system (eg yum update -f, from memory, though that might not be quite right). Rather than syncing at intervals you might be better off with a manual or semi-automated approach.

One easy way to do this once you have user data set up is to simply increase the desired size of your auto scaling group to double the size you really need, wait 15 minutes, then change the desired size down again. Default termination policy for auto scaling is that instances are terminated in this order

  1. AZ with most instances
  2. Something around pricing
  3. Oldest launch configuration
  4. Next billing hour

This will probably kill your older instances, but not definitely. You'd be better off using a custom termination policy, with a policy to kill the oldest instances first. Information about that is on the same page.

Tim
  • 31,888
  • 7
  • 52
  • 78
  • When my instances are launched, I sync the application code with the one I have in my S3 bucket using a cronjob. I do not see what it has to do with performing common automated configuration tasks. I need the application code within my Instances to be in sync with my S3 bucket using the following cronjob every few minutes : `/home/bitnami/.local/bin/aws s3 sync --delete /opt/bitnami/apps/wordpress/htdocs s3://nutriti-code` I may not understand you anwser clearly... – wlarcheveque Mar 14 '19 at 00:57
  • It's very unusual to need to reconfigure your application every few minutes. Any changes are typically done by deployments. I think you need to rethink your requirements and approach, I gave you one possible approach. – Tim Mar 14 '19 at 19:23