0

We're running a threaded ruby server (Puma), and have seen serious performance issues with our Sinatra app. Specifically, something as simple as Thread.pass can take over 2s. How is it possible that a server with 16 threads can take over 2s to return control to a thread? Is the Ruby scheduler that bad, or is there something we can do to fix this?

Details:

  • Ruby implementation: MRI 2.1
  • Sinatra App
  • Running on Heroku 1x dynos
  • Puma server, running 16 threads, 1 process
  • Some routes are doing fairly heavy work, but routes doing almost no work are impacted
  • Over 100MB in free memory

Thanks in advance!

scosman
  • 2,343
  • 18
  • 34
  • FYI: the answer ended up being our Heroku's virtualization layer starving us of CPU. We were not even getting the equivalent of 1 dedicated CPU (they advertise 4). Dedicated hosts fixed the issue. – scosman May 16 '16 at 18:42

2 Answers2

0

The time that Thread.pass takes is a non-specified value, it may take 10s or it might not pass at all (i.e. continue execution immediately).

Thread.pass is more of a hint or a suggestion.

nort
  • 1,625
  • 13
  • 12
  • `Thread.pass` being a hint/suggestion isn't the issue, I am definitely seeing it pass. The performance of the scheduler is the issue. I'm not looking for a realtime guarantee, but frequent 2+ second delays with only 16 threads is incredibly poor (over 125ms per thread - it's round robin). – scosman Jun 29 '15 at 18:11
  • Are the threads performing approximately 125ms of work? Are they saturated / CPU bound? – nort Jun 30 '15 at 05:27
  • Some are performing more than 125ms, but the system time-slice time is 10ms. They theoretical max for a non blocking round-robin pass (like Thread.pass) should be 160ms for 16 threads, and I'm seeing > 10x that. – scosman Jun 30 '15 at 13:08
0

Long story short: it's the heroku virtual machine.

Sometimes your whole virtual machine pauses, so the program (in whatever language) just stops responding for a few seconds. Running on dedicated boxes 100% resolved this issue. Heroku 1x/2x dynos don't really seem reliable for applications where multi-seconds pauses are unacceptable. I get that sharing resources is needed, but completely pausing the world for multiple seconds is too much. Seems like their scheduling could use some work.

scosman
  • 2,343
  • 18
  • 34