2

We have developed a Rails app in Heroku, we have around 3 web dynos, and 2-3 worker dynos. We have some exporting and importing functionalities that use a lot of our worker dynos, when that happens, everything crashes, and we get an App error un the website.

Sentry tells us that it is due to a Timeout. We are trying to find out which functionality of our software is taking so much worker time. The problem is that it affects all of our users, some of them only using web layer functionalities.

But I was wondering, is there a way to isolate our worker dyno problems from the web dynos work? I mean, Is there a way that our site does not crash when one user exports a big amount of data and saturates the workers?

Thanks in advance!

Regards, Gonzalo

Félix Paradis
  • 5,165
  • 6
  • 40
  • 49
Gonzalo
  • 21
  • 1
  • I guess the workers aren't async? – jvillian May 05 '18 at 00:34
  • 1
    it may be a db issue not a heroku issue, may be the import/export is locking writes to db. i am guessing of course – sethi May 05 '18 at 12:22
  • Surprising that a worker dyno could cause a web dyno to throw an Application Error. Certainly there must be something in the web dyno's logs when this occurs... Have you investigated the logs to see what's happening before the crash? – Charlie Schliesser May 05 '18 at 14:58

1 Answers1

0

thank you for the answers, let me give you some related info:
- We use delayed_job for the workers, which is async.
- We used to have a DB connection before, it supported 120 connections, and we never saw it completely busy. The current AWS RDS only reached 24% of its use and we only saw 28 concurrent connections on the day of the crash.
- New Relic did not indicate delay in the DB.
- The web dynos start to generate timeout in many functionalities, if the crash is not related to the workers, it may be because of some functionality that is not in jobs.

Update: - We have set some limits in our exports, even when those are in jobs, this was affecting our web and giving some App Errors. When we set the limit, the App Errors were dramatically reduced.
- We are still searching for any other unoptimized functionality.

Gonzalo
  • 21
  • 1