Why the time taken to fetch results of a query for 10,000 times from a RDS is so uneven? (experiment)

Question

tl;dr : Why the time taken to fetch results of a SELECT statement for 10,000 times from a RDS when requested from an EC2 is so uneven?

Updated the question with results of small and medium RDS servers

While experimenting with AWS for checking time being taken for fetching the result of a SQL query I got the following very uneven result:

I wrote a PHP code to report me the time taken to fetch the reqult of a SELECT query for n times from the server.

while($flag<n)
    {
       $t=microtime(true);
       $result=$con->query($q);
       $t=microtime(true)-$t;
       $total+=$t;
       $flag++;
    }

Enviornment:

All transactions were done inside a private vpc in AWS
All servers are in different zones
Configuration of MySQL on each server is : MySQL on EC2: version = 5.6, on RDS : 5.5, query_cache_size = 16777206, query_cache_state = ON.
Database A Large databass ~= 5GB, Table queried had ~= 20000 rows.

Servers :

EC2 A, Availability zone : us-east-1e, Type : t2.micro.
EC2 B, Availability zone : us-east-1b, Type : t2.micro.
RDS Availability zone : us-east-1c, Type : db.t2.micro, db.t2.small (updated), db.t2.medium (updated)

Results :

Time taken for 10,000 loops of performing a SELECT query :

Requesting server B, Database server B

Results for 5 trials were : 20, 21, 20, 20, 21 (All in sec)

Requesting server A, Database server B

Results for 5 trials were : 33, 33, 33, 33, 3 (All in sec)

Requesting server A, Database server RDS (micro)

Results for 11 trials were : 272, 709, 49, 48, 711, 593, 47, 316, 153, 47, 636 (All in sec)

Requesting server A, Database server RDS (small)

Results for 5 trials were : 53, 54, 53, 158, 698 (All in sec)

Requesting server A, Database server RDS (medium)

Results for 5 trials were : 96, 123, 579, 252 (All in sec)

Why is the time taken by RDS in the test of 10,000 loops of the SELECT statement so uneven? And why is it so high than EC2 servers?

[I don't think its due to network, because when I did the experiment with lesser loops (1000 loops) the readings for EC2 -> RDS were 4, 5, 5, 5, 4.]

When I logged time for each fetch request I noticed the following:

For the case when it took 153 seconds for 10,000 loops on RDS:

Average time Taken by each query : 0.015419

No of Queries Took more than average time out of 10000 :1644

Total Time Taken by the Queries Which took more than Average Time to Complete : 119.364 (78% of total time)

For the case when it took 636 seconds for 10,000 loops on RDS:

Average time Taken by each query : 0.063605

No of Queries Took more than average time out of 10000 :8629

Total Time Taken by the Queries Which took more than Average Time to Complete : 628.6426 (98.8% of total time)

Edit1 :

I logged time for each fetch request in the cycle of 10,000 req I noticed that after some requests the time for each req increases to ~0.07 (from ~0.003) sec. But this increases happens after a random number of requests. Like, sometimes after ~8000 requests and sometimes after ~3000 req. What could be the reason? Also when it takes ~45 sec for 10,000 req then CPU utilization in RDS is around 5% . While when it takes >100 sec then CPU is around 10-15%.

Edit2:

I upgraded the RDS server from t2.micro to t2.small and further to t2.medium. Again the performance was uneven:

Requesting server A, Database server RDS (small)

Results for 5 trials were : 53, 54, 53, 158, 698 (All in sec)

Requesting server A, Database server RDS (medium)

Results for 5 trials were : 96, 123, 579, 252 (All in sec)

Edit3:

I switched the RDS to another zone. Now the readings seem to be consistent. May be the problem was CPU stealing by some noisy neighbor.

Requesting server A, Database server RDS (small, in a different zone)

Results for 5 trials were : 156, 151, 151, 151, 151, 302 (All in sec)

Is it not simply because these are virtual instances on physical servers, and so other systems are running on them too. The slower one must either be lesser hardware or have other users running at higher load? — i-CONICA, Dec 15 '14 at 13:21
Hi! I experimented on a quite server. No other requests were being served by it while experimenting. That's why I find the results to be very unexpected. — Sumit Sinha, Dec 15 '14 at 13:31
But surely the load averages you see are for your virtualised instance? Not the load that the host machine is under? I don't know AWS specifically though, so just speculating. — i-CONICA, Dec 15 '14 at 13:33
If that were the case then there should have been some similar anomalies in the case when EC2 was requested. Since h it is also a virtual server. — Sumit Sinha, Dec 15 '14 at 14:17

score 6 · Answer 1 · answered Dec 15 '14 at 13:11

6

I noticed you're using the db.micro instance. As with EC2, the micro instances are designed to be budget-friendly, but at a cost of performance. That said, you'll get a much worse performance when loading these types of servers versus the normal instances because the CPU time is given to the instance "last" compared to other instances sharing the same hardware.

To prove the point, run your tests again against a db.medium instance and you'll find it much more consistent.

answered Dec 15 '14 at 13:11

Nathan C

15,059
4
43
62

1

+1 - your use of the "micro" class of instance is the issue here. – Craig Watson Dec 15 '14 at 13:19
I logged time for each fetch request in the cycle of 10,000 req. I noticed that after some requests the time for each req increases to ~0.07 (from ~0.003) sec. But this increases happens after a random number of requests. Like, sometimes after ~8000 requests and sometimes after ~3000 req. What could be the reason? Also when it takes ~45 sec for 10,000 req then CPU utilization in RDS is around 5% . While when it takes >100 sec then CPU is around 10-15%. – Sumit Sinha Dec 15 '14 at 13:19
3

@SumitSinha The micro class instances are designed for burst. Do to this, you'll get "high" performance for some period of time...then it'll get throttled back. In EC2 you can see this by watching `top` output while loading the server's CPU with some activity - the "steal" CPU usage increases as a form of throttling. – Nathan C Dec 15 '14 at 13:24
@SumitSinha Also, RDS micro instances have very unpredictable performance because of its burst nature, so benchmarking them may prove to be useless. – Nathan C Dec 15 '14 at 13:25
Thanks a lot! *Moves to find out resource to read about this behavior in AWS instances* .. (links to such resources appreciated :) ) – Sumit Sinha Dec 15 '14 at 13:28
@NathanC I upgraded the server from micro to small and then further to medium. But again the results are inconsistent : + Requesting server A, Database server RDS (small) Results for 5 trials were : 53, 54, 53, 158, 698 (All in sec) + Requesting server A, Database server RDS (medium) Results for 5 trials were : 96, 123, 579, 252 (All in sec) – Sumit Sinha Dec 16 '14 at 08:09
The results are inconsistent because there's other users on the machines doing other, random things beyond your scope or control. You can't do reliable software benchmarking on hardware you don't have exclusive access to. – i-CONICA Dec 16 '14 at 08:54
@i-CONICA Seems you were right. This time I creates a small RDS instance but in a different zone and my reading are consistent now. May be the problem lies with noisy-neighbors.[Results + Requesting server A, Database server RDS (small, in a different zone) Results for 5 trials were : 156, 151, 151, 151, 151 (All in sec) – Sumit Sinha Dec 16 '14 at 10:37
Getting 4 consecutive consistencies within a second of each other is pretty impressive on any system. :) – i-CONICA Dec 16 '14 at 10:41