I have a Ruby v2.3.4 application based on Grape v0.19.2.
Recently, after our last deployment, we noticed that the system shut-down and our god v0.13.7 process monitor started it back up again. After looking at the crashlogs, we're seeing 20-30 crashes a week.
Here are some sample crash reports:
/.rvm/gems/ruby-2.3.4/gems/bson-4.2.1/lib/bson/hash.rb:80: [BUG] rb_gc_mark(): 0x007fa2f4fb33f0 is T_NONE
/.rvm/gems/ruby-2.3.4/gems/mongo-2.4.1/lib/mongo/socket.rb:176: [BUG] rb_gc_mark(): 0x007f990c383360 is T_NONE
/.rvm/gems/ruby-2.3.4/gems/activesupport-5.1.1/lib/active_support/callbacks.rb:102: [BUG] rb_gc_mark(): 0x007ffbeb9e3880 is T_NONE
These crashes seem to happen randomly and can be 5-7 days apart or several will happen in an hour. The stacktraces in the crash logs aren't very helpful and show basically everything we're running.
Currently our strategy has been to roll back our entire code base and look at all the changes that went in, but they are very numerous. The dependencies on 30-40 updated gems also changed. Since the crashing appears to be random, it's very difficult to test if a change to the code or gem has fixed the issue.
This issue appears to be garbage collection related, so I tried using GC in debug mode to see if that could help us create a reproducible case, but the application would take orders of magnitude longer to startup and run so that strategy wasn't viable.
What would be a good strategy to force a crash so we can narrow down whether the problem came from our code update or a dependent gem?