1

It seems that using Bull 3.21.1 as a work queue with default configuration will cause Redis to retain keys indefinitely under successful operation, eventually exhausting memory in Redis and causing a crash. Here is an example experience, described. I had the same experience. Here's another, in which it is explained that although the default behavior of Bull cannot be memory-leak-free without causing breaking changes for existing users, it could be documented better that the default behavior will retain redis keys for completed jobs indefinitely, and what the configuration is to obtain operation without leaking memory in Redis. Bull's documentation at the writing of this question still contains no mention of this behavior, configuration point, nor solution.

Having had a production (or, luckier, pre-production) crash due to the undocumented default behavior of Bull to retain redis keys of completed jobs forever, how do I recover?

Tim Heilman
  • 340
  • 2
  • 11

1 Answers1

5
  1. If possible, increase memory available to redis to relieve the immediate memory pressure. This is what we did.
  2. A potentialy quick manual fix to remove old job's keys is to use this method: Bull.Queue#clean(1000 * 60 * 60 * 24) within an NPM script specified in package.json, and run against your prod node instance. (The argument is a grace period in milliseconds that completed jobs will not be reaped, so that value will be jobs older than 24 hours.) We only did this after the below, to purge all old jobs, but it could have been employed earlier to deflate the balloon and buy more time.
  3. Fix the memory leak by providing the non-defaut configuration to Bull: defaultJobOptions: { removeOnComplete: true, removeOnFail: true }. This will end the ramp-up of redis bull key count and memory consumption, and provide the least-astonishing behavior of not-having a memory leak under default configuration and successful operation.
Tim Heilman
  • 340
  • 2
  • 11