1

I'm part way through uploading about 200,000 files (each is ~1MB max) to an S3 bucket from an EC2 instance (both in Europe West).

From monitoring the EC2 with CloudWatch (looking at the NetworkOut metric), there seems to be a drop-off in the upload transfer over time:

enter image description here

I'm uploading the files in several tranches and the drop-off seems consistent, usually after four or five hours (but it sometimes occurs more quickly).

The files are uploaded with a Python script, which:

  1. Downloads a .zip from a third party server
  2. Extracts about 25 files from the .zip and gzips each file
  3. Uploads the .gzip files to the bucket

I've tried two ways of uploading the .gzip files...

  • Sequentially, using boto3: boto3.client("s3").upload_file(file.gz, bucket, file.gz)
  • Running the AWS CLI as a subprocess to upload 25 .gzip files at a time

...But I saw the same drop-off with each method.

What could be causing this? Or what information should I collect to debug it?

Edit

Here's a chart for the same period, showing the BurstBalance metric (the EC2 instance is a t2.small):

enter image description here

Here's CPUCreditBalance:

enter image description here

user2950747
  • 113
  • 5

1 Answers1

6

My best guess is it's your EBS I/O credits. Monitor this with the BurstBalance CloudWatch metric. Please check, post a graph, and if it's not that I'll think some more.

Update - that third graph I asked you to add shows that you've run out of CPU credits. Your CPU is being throttled. You can either accept the slower performance or temporarily change to more suitable instance.

This looks quite CPU intensive. You could move to a t2 large and get four times the CPU allowance, or I'd probably move to a general purpose m4 instance for a while. Changing instance type is easy - stop the instance, right click, change instance type, then start it again.

Tim
  • 31,888
  • 7
  • 52
  • 78
  • That's a solid possibility. I hadn't thought of that. – EEAA Jan 11 '17 at 22:31
  • Thanks, never heard of I/O credits. Have updated the question. – user2950747 Jan 11 '17 at 22:44
  • Ok, so it's not that, but the answer is worth keeping as it might help others. Can you please a graph showing CPU credits? – Tim Jan 11 '17 at 23:00
  • Makes sense! The upload is a one-off process so I might well temporarily upgrade to speed it up. Thanks again. – user2950747 Jan 11 '17 at 23:20
  • the same goes for EC2 instance. When you are on EC2 instance type which has limited length of CPU burst (https://www.ec2instances.info - instance type with vCPUs column with "for a Xh Xm burst"), the machine slows down significantly after a longer CPU burst. – xhafan May 22 '19 at 08:59