I have a very strange problem with my rails app on Heroku that I haven't been able to solve for nearly a month now. It's a problem that only occurs on the production server and I can't replicate in development and for which the logs report no errors accept H18 errors.
Here's what is happening. The application runs fine for about 12 hours, then at a certain point the number of requests spike a bit and Heroku starts reporting regular H18 errors
At this point the application doesn't completely fail, but all requests that invoke some kind of open-uri requests fail (basically request to an external webservice), return a 500 error. Normal requests that simply display a static view still seem to continue to work.
The logs are not particularly helpful.
Nearly every H18 error is associated with a request for /robots.txt but at least one error is associated with an assets request: "/assets/application-38a4580edd72e30f34ea76583ab7e1b1b5654c72a6313ece935177d23b0398d3.css"
Below is an excerpt
Oct 10 21:09:32 lombardpress-web heroku/router: sock=backend at=error code=H18 desc="Server Request Interrupted" method=GET path="/robots.txt" host=scta.lombardpress.org request_id=e0e344d1-0349-4b0a-8db0-3c6e3ad3e99f fwd="157.55.39.188" dyno=web.1 connect=0ms service=209ms status=503 bytes= protocol=http
Oct 10 21:09:41 lombardpress-web heroku/router: sock=backend at=error code=H18 desc="Server Request Interrupted" method=GET path="/assets/application-38a4580edd72e30f34ea76583ab7e1b1b5654c72a6313ece935177d23b0398d3.css" host=scta.lombardpress.org request_id=fdd88ca2-140e-4051-9011-4d81ca218f19 fwd="157.55.39.206" dyno=web.1 connect=0ms service=252ms status=503 bytes= protocol=http
Oct 10 21:33:55 lombardpress-web heroku/router: sock=backend at=error code=H18 desc="Server Request Interrupted" method=GET path="/robots.txt" host=scta.lombardpress.org request_id=5aa463f5-43ff-4b74-a2de-e944aa9d2387 fwd="46.229.168.147" dyno=web.1 connect=0ms service=254ms status=503 bytes= protocol=http
Oct 10 21:38:23 lombardpress-web heroku/router: sock=backend at=error code=H18 desc="Server Request Interrupted" method=GET path="/robots.txt" host=scta.lombardpress.org request_id=2300949d-8998-4ed0-a4ab-bf1975e93cc6 fwd="216.244.66.197" dyno=web.1 connect=0ms service=220ms status=503 bytes= protocol=http
Once the errors start, I simply have to restart the app and everything works fine again for another 12 or so errors. Even requests to robots.txt. But after approximately 12 hours the same problem occurs.
Due to web crawlers, I would estimate the application is getting hit a with a request every 2-4 seconds. Otherwise it is not a particularly high traffic site.
Heroku provided only the smallest feedback
The backend socket, belonging to your app’s web process was closed before the backend returned an HTTP response. This is happening at the application or web server level so there is not much additional insight we can provide. You can see that the request spends some time in your application service=1118ms before returning a status 500. I wonder if a middleware (maybe you have one making external requests?) is failing before it hits your actual rails stack. I would suggest starting by looking there.
The code is open source and available here if anyone is interested in poking around: https://github.com/lombardpress/lombardpress-web
I'd be grateful for any thoughts or suggestions. I've been struggling with this for a while now and I'm not sure how to solve the problem.