0

I am building a "surge traffic" app that can go from 0 - 50,000 clients, at once, within minutes. I thought I'd ask you guys for some help understanding if I am doing something wrong here.

Currently, I am testing with loader.io, my configuration for this load test is 0 to 10,000 clients over 1 minute. The only thing the tester is doing is loading the login page, nothing more, not even logging in. The page size is 793KB, equalling ~400ms load time on a real browser.

[01-Mar-2018 09:57:48] WARNING: [pool app.com] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 4244 idle, and 4607 total children
[01-Mar-2018 09:57:49] WARNING: [pool app.com] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 16 children, there are 4216 idle, and 4615 total children
[01-Mar-2018 09:57:50] WARNING: [pool app.com] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 4211 idle, and 4631 total children
[01-Mar-2018 09:57:52] WARNING: [pool app.com] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 4179 idle, and 4663 total children
[01-Mar-2018 09:57:54] WARNING: [pool app.com] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 4181 idle, and 4695 total children
[01-Mar-2018 09:57:57] WARNING: [pool app.com] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 4244 idle, and 4727 total children
[01-Mar-2018 09:57:58] WARNING: [pool app.com] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 4412 idle, and 4759 total children

My php-fpm configuration is as follows:

pm.max_children 3000
pm.max_requests 200
pm  dynamic
pm.start_servers 1500
pm.min_spare_servers 300
pm.max_spare_servers 1500

Host Server specs:

AMD Opteron(tm) Processor 6344

Core Name
Abu Dhabi
# of Cores
12-Core
# of Threads
12
Operating Frequency
2.6 GHz
Hyper Transports
6.40 GT/s
L2 Cache
6 x 2MB
L3 Cache
2 x 8MB
Manufacturing Tech
32 nm

50GB RAM (I have provided this to the container), server has 64 total.

And oddly, even with this configuration, I am still getting "seems busy" messages!

Here is where things die: enter image description here

The most important things to remember here is that normally we have little to no traffic, that we will expect up to 50,000 users within minutes, though. Can you help me with this error please guys. Thanks!

1 Answers1

0

You need to keep in mind, that spinning up those additional workers takes time and as you can see in the logs, php-fpm is spawning 32 children at a time. The easiest "fix" would be to set min and max servers to the same value, that way, you don't need to wait for them to spin up.

Also if possible, instead of trying to do something like that, try to implement caching, as it will always be faster.

Gothrek
  • 531
  • 2
  • 8
  • Shouldn't I have plenty of servers though? Start server is 1500. According to my image in my question latency goes crazy even at the 1500. Do you think it's really cache? It's pretty heavily cached as it is.. – Gordon Snappleweed Mar 01 '18 at 21:03
  • If it was properly cached, you wouldn't need to support 50k users on the backend. Unless every user is served unique data. And if I'm reading your graph correctly, you start getting drop offs at 1,5k users. – Gothrek Mar 01 '18 at 21:05
  • The other issue is that you have 125 servers per core, that must result in heavy switching on the OS. Check vmstat and it's running/blocked values. – Gothrek Mar 01 '18 at 21:11
  • Stats: https://cl.ly/3e3f0Y2d1E3m – Gordon Snappleweed Mar 01 '18 at 21:14
  • According to that screen you are spending 97% of CPU time in IO waits - it's the wa under CPU. So that's probably part of the issue you have. Have you done any tuning of the ip stack? If not, that might help you some, but will not remove all your issues completely. Here's a good guide on linux network tuning, it's an oldie but a goldie: https://access.redhat.com/sites/default/files/attachments/20150325_network_performance_tuning.pdf – Gothrek Mar 01 '18 at 21:19
  • This would still be applicable to a container? (this is virtualized) – Gordon Snappleweed Mar 01 '18 at 21:20
  • Yeah, but then you need to be sure that you aren't hitting bottlenecks on higher levels, like the vm host. – Gothrek Mar 01 '18 at 21:23