0

I'm deploying a rails app through passenger and nginx (utilizing an elastic-search server which is running on the same machine) on a Ubuntu system. This works perfectly for about twelve hours. Then, the servers only response is a 503 message. Restarting nginx fixes the problem.

I already looked plenty at the logs of rails/nginx/elasticsearch, but couldn't find any clues on this "crash". Only some generic routing errors are visible.

Is there any other place I could check? How can I effectively debug this behaviour?

panmari
  • 115
  • 8
  • What _do_ you have in nginx's error log? If a 503 was served, then _something_ should be there. – Michael Hampton Jun 01 '13 at 19:25
  • Last thing before the crash from the error log was just some passenger stdout: `[ 2013-06-01 06:44:01.5532 11950/7f56449c4700 Pool2/SmartSpawner.h:301 ]: Preloader for [deployment dir] started on PID 12335, listening on unix:/tmp/passenger.1.0.11945/generation-0/backends/preloader.12335`. Then nothing until I restarted the nginx. Do I have to turn up the verbosity of the logs somehow? – panmari Jun 01 '13 at 19:39

2 Answers2

0

503 means service not available, usually maintenance or server can't respond because overloaded.

Assuming all packages related to the deployment are updated, and no external (d)DoS attacks are attempted, does the app/elasticsearch establish many connections outside? Try to have a look to netstat.

fsoppelsa
  • 457
  • 1
  • 6
  • 12
  • Not sure how to interpret the output of netstat. I see a lot of these: `tcp 0 0 localhost:36455 localhost:9200 TIME_WAIT`. I assume they are from elasticsearch? Could these cause problems? – panmari Jun 01 '13 at 11:35
  • According to Elasticsearch documentation https://github.com/elasticsearch/elasticsearch the default port of Elasticsearch is 9200. – fsoppelsa Jun 01 '13 at 11:54
  • ok... so what exactly should I be looking for in netstat? – panmari Jun 01 '13 at 19:23
0

After further investigations, I could solve the problem. After running for a while, elasticsearch hogs all CPU time for itself, leaving no CPU time for nginx/passenger. When enough requests piled up there, nginx just dies.

So the problem lied in elasticsearch. I tried multiple configurations, but none of them seemed to change anything.

Following an advice from a post here http://elasticsearch-users.115913.n3.nabble.com/, I tried running elasticsearch on oracles JVM instead of openJDK. This did the trick. Since then the system is stable.

panmari
  • 115
  • 8