1

Strange problem. We have 6 nodes behind a loadbalancer. They are high-spec VPSes running Ubuntu. On a separate node we run Redis. Further nodes run MySQL. The whole LAMP setup hosts Magento.

Transitioning from a file based cache to Redis central cache, we started to change each Magento node initially one by one to use Redis trough Cm_Cache_Backend_Redis With Redis being used by two servers, everything runs fine. So we decided to switch the remaining 4 servers too. But then performance starts to tank big time. The performance regression is as much as 300% as confirmed by New Relic. App response time goes from a reasonable 900-1200ms to 3K+ms. Page load time gets horrible, jumps at least 2 seconds, oftentimes more. Under heavy-ish (200 users spread across 6 servers) peak load, the regression is even more profound.

In the traces, we start seeing that all is not well with Redis.

Category    Slowest components  Count   Duration    %
Custom  Varien_Simplexml_Element::asNiceXml 578 19,200 ms   33%
Custom  Varien_Simplexml_Element::extendChild   673 10,200 ms   18%
Custom  Cm_RedisSession_Model_Session::read 1   5,070 ms    9%
Custom  Varien_Simplexml_Element::extend    76  4,380 ms    8%
Custom  Varien_Simplexml_Element::hasChildren   69  2,690 ms    5%
Custom  Mage_Core_Model_Config::loadModulesConfiguration    1   2,270 ms    4%
Remainder   Remainder   1   13,700 ms   24%
Total time          57,500 ms   100%

The XML module and core config loading becomes dead slow, Redis sessions, which are normally fast, now instantly become slow. The whole lot grinds down to a slow crawl.

The Redis server is a default Ubuntu install we don't directly control right now. The client side on the 6 nodes we do control. Right now, it uses the built-in Credis standalone client, which we intend to swap out with phpredis PECL client, which should give somewhat of a performance boost.

Everything else is default as per https://github.com/colinmollenhour/Cm_Cache_Backend_Redis

Hopefully the client swap will make all the difference, but in the meanwhile, we're keen to hear further suggestions. Why would 2 nodes work fine and fast, but it starts choking on 6? Does this sound like client or server side trouble to you?

Your thoughts are very welcome.

JayMcTee
  • 3,923
  • 1
  • 13
  • 22
  • Hi Jay, I know that this questions comes from a long time ago, but did you find the root cause of the problems? – davidselo Dec 09 '16 at 12:29

0 Answers0