0

My ingestion pipeline is the following

queue -> ec2 -> rds

I have observe the following, when I have first turn on my ec2 to ingest from sqs the write/second on rds is really fast. But after 4 hours the sql write time starts to increase.

Here is a graph for illustration

Write per second https://i.stack.imgur.com/XavU0.jpg

Queue Depth https://i.stack.imgur.com/vXkGr.jpg

There is still alot of message in the queue so I verify that not enough data from sqs is not the problem source. I try logging the sql write time. From update to commit, and found out the latency for writing has increase 10 fold so its probably not the cpu credit issue from ec2. But if I turn the ec2 ingestion off and turn it on a day later I see great performance again.

Here is what I verify

  1. rds cpu credit did not drop too much, barely drop any
  2. ec2 cpu credit did drop from 150 to 6, but the latency increase in rds starts around 2am and ec2 exhaust its cpu credit around 9am and the bottleneck is still the sqsl write time which increase from 0.002 to 0.04

I am using 2 ec2 t2.micro and 1 t2.medium rds.

I am suspecting there is some network bandwidth limit that I need to change. Or maybe cpu credit on ec2 somehow increase latency on sql write?

Can someone point me the right direction?

J L
  • 41
  • 1
  • 3
  • I don't think this has anything to do with bandwidth. You state you're exhausting your CPU credit, which would generally make SQL writes bog down, as they require some CPU to process. Try it on a non-bursting instance type and see what happens. – ceejayoz Apr 25 '18 at 16:47
  • the timing is not right tho, the ec2 cpu exhaust is around 9am, but the sharp increase for latency is around 2am. – J L Apr 25 '18 at 16:51
  • 1
    There is not, to my knowledge, any sort of bandwidth cap (beyond the underlying limitations of however many gigabits are physically available to the host's networking cards) on EC2 instances. It'd be silly for Amazon to put them in place, given that we pay for bandwidth... – ceejayoz Apr 25 '18 at 16:53
  • A noisy neighbor is conceivably possible, which you could test by moving the instance to another host by stopping it, waiting a few minutes, and starting it back up. – ceejayoz Apr 25 '18 at 16:54
  • ok, i will test that, one interesting thing that I observe is that if I wait for a day the latency is down again, that is why I suspect some sort of throttling going on . – J L Apr 25 '18 at 17:14
  • 1
    @JL take a look at Queue Depth and Write Latency in RDS. If these are going up when performance goes down, it suggests that you are exhausting the burst capacity of your RDS instance's underlying EBS volume. – Michael - sqlbot Apr 25 '18 at 18:41
  • @Michael-sqlbot good point, i took a look at the queue depth but don't see a 2am increase in queue depth – J L Apr 25 '18 at 19:04
  • 2
    @JL it's also not entirely clear what you're saying about the ec2 credit exhaust and the timing. The performance slows down before you actually hit a 0 balance, to prevent credit exhaust from being a harsh brick wall. Switch the t2 into Unlimited mode and you can easily rule that out. The cost for adding unlimited on a micro is capped at $0.05/hr, less depending on the workload. – Michael - sqlbot Apr 25 '18 at 22:55

0 Answers0