0

I've been struggling to wrap my head around how I can debug why my Sidekiq processor is stuck.

I've read the following resources:

I attached output to all gdb traces. https://gist.github.com/Overload119/7f75d28d78e683ff3096bd9058164c9c

I took all the lines that had C extensions

#0  0x00007fd71db8400c in pthread_cond_wait@@GLIBC_2.3.2 () at /lib/x86_64-linux-gnu/libpthread.so.0
#159 0x00007fd71db7dfa3 in start_thread () at /lib/x86_64-linux-gnu/libpthread.so.0
#160 0x00007fd71d1b54cf in clone () at /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fd7193323bc in std::condition_variable::wait(std::unique_lock<std::mutex>&) () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00007fd713ebe5e9 in rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long) () at /usr/lib/librocksdb.so.5.17
#3  0x00007fd713ebe981 in rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*) () at /usr/lib/librocksdb.so.5.17
#4  0x00007fd719337b2f in  () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007fd71db7dfa3 in start_thread () at /lib/x86_64-linux-gnu/libpthread.so.0
#6  0x00007fd71d1b54cf in clone () at /lib/x86_64-linux-gnu/libc.so.6
#48 0x00007fd71db7dfa3 in start_thread () at /lib/x86_64-linux-gnu/libpthread.so.0
#49 0x00007fd71d1b54cf in clone () at /lib/x86_64-linux-gnu/libc.so.6
#0  0x00007fd71d1aa916 in ppoll () at /lib/x86_64-linux-gnu/libc.so.6
#18 0x00007fd71db7dfa3 in start_thread () at /lib/x86_64-linux-gnu/libpthread.so.0
#19 0x00007fd71d1b54cf in clone () at /lib/x86_64-linux-gnu/libc.so.6
#80 0x00007fd71db7dfa3 in start_thread () at /lib/x86_64-linux-gnu/libpthread.so.0
#81 0x00007fd71d1b54cf in clone () at /lib/x86_64-linux-gnu/libc.so.6
#47 0x00007fd71db7dfa3 in start_thread () at /lib/x86_64-linux-gnu/libpthread.so.0
#48 0x00007fd71d1b54cf in clone () at /lib/x86_64-linux-gnu/libc.so.6
#35 0x00007fd71db7dfa3 in start_thread () at /lib/x86_64-linux-gnu/libpthread.so.0
#36 0x00007fd71d1b54cf in clone () at /lib/x86_64-linux-gnu/libc.so.6
#198 0x00007fd71db7dfa3 in start_thread () at /lib/x86_64-linux-gnu/libpthread.so.0
#199 0x00007fd71d1b54cf in clone () at /lib/x86_64-linux-gnu/libc.so.6
#0  0x00007fd71d1aff59 in syscall () at /lib/x86_64-linux-gnu/libc.so.6
#7  0x00007fd71db7dfa3 in start_thread () at /lib/x86_64-linux-gnu/libpthread.so.0
#8  0x00007fd71d1b54cf in clone () at /lib/x86_64-linux-gnu/libc.so.6
#0  0x00007fd71d1b57ef in epoll_wait () at /lib/x86_64-linux-gnu/libc.so.6
#9  0x00007fd71db7dfa3 in start_thread () at /lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007fd71d1b54cf in clone () at /lib/x86_64-linux-gnu/libc.so.6
#11 0x00007fd71db7dfa3 in start_thread () at /lib/x86_64-linux-gnu/libpthread.so.0
#12 0x00007fd71d1b54cf in clone () at /lib/x86_64-linux-gnu/libc.so.6
#37 0x00007fd71db7dfa3 in start_thread () at /lib/x86_64-linux-gnu/libpthread.so.0
#38 0x00007fd71d1b54cf in clone () at /lib/x86_64-linux-gnu/libc.so.6
#8  0x00007fd71db7dfa3 in start_thread () at /lib/x86_64-linux-gnu/libpthread.so.0
#9  0x00007fd71d1b54cf in clone () at /lib/x86_64-linux-gnu/libc.so.6
#0  0x00007fd71db843f9 in pthread_cond_timedwait@@GLIBC_2.3.2 () at /lib/x86_64-linux-gnu/libpthread.so.0
#53 0x00007fd71db7dfa3 in start_thread () at /lib/x86_64-linux-gnu/libpthread.so.0
#54 0x00007fd71d1b54cf in clone () at /lib/x86_64-linux-gnu/libc.so.6
#124 0x00007fd71db7dfa3 in start_thread () at /lib/x86_64-linux-gnu/libpthread.so.0
#125 0x00007fd71d1b54cf in clone () at /lib/x86_64-linux-gnu/libc.so.6
#20 0x00007fd71db7dfa3 in start_thread () at /lib/x86_64-linux-gnu/libpthread.so.0
#21 0x00007fd71d1b54cf in clone () at /lib/x86_64-linux-gnu/libc.so.6
#40 0x00007fd71db7dfa3 in start_thread () at /lib/x86_64-linux-gnu/libpthread.so.0
#41 0x00007fd71d1b54cf in clone () at /lib/x86_64-linux-gnu/libc.so.6

Here I think

#1  0x00007fd7193323bc in std::condition_variable::wait(std::unique_lock<std::mutex>&) () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00007fd713ebe5e9 in rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long) () at /usr/lib/librocksdb.so.5.17
#3  0x00007fd713ebe981 in rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*) () at /usr/lib/librocksdb.so.5.17

looks suspicious but I'm not sure where to go from here. Is there anyway I can confirm whether or not this is holding on to a lock and causing the Sidekiq job to get stuck?

Overload119
  • 5,124
  • 5
  • 29
  • 34
  • According to that GDB output, your Ruby process has 600 threads. That's way way way too many. – Mike Perham Jul 14 '22 at 23:31
  • Thanks - I've managed to reduce it and now have a local reproduction. https://gist.github.com/Overload119/a210656a12a09765088d9412869c8e08 Based on the post I read, if Sidekiq responds to -TTIN then it likely isn't an issue with GVL. Instead, it may be a deadlock. My understanding is that a thread with `sleep` as the last line is not deadlocked. I see a lot of `rufus-scheduler-3.8.2/lib/rufus/scheduler/jobs_core.rb:321:in `pop'` which leads me to believe maybe this is the deadlock? – Overload119 Jul 16 '22 at 04:16

1 Answers1

0

You're using Sidekiq in development mode, where Rails only supports one job at a time. Your hung job is here:

2022-07-16T03:53:35.791Z pid=7924 tid=iok WARN:     /Users/amirsharif/.rvm/gems/ruby-2.7.5/gems/concurrent-ruby-1.1.10/lib/concurrent-ruby/concurrent/promises.rb:775:in `sleep'
/Users/amirsharif/.rvm/gems/ruby-2.7.5/gems/concurrent-ruby-1.1.10/lib/concurrent-ruby/concurrent/promises.rb:775:in `wait'
/Users/amirsharif/.rvm/gems/ruby-2.7.5/gems/concurrent-ruby-1.1.10/lib/concurrent-ruby/concurrent/promises.rb:775:in `block in wait_until_resolved'
/Users/amirsharif/.rvm/gems/ruby-2.7.5/gems/concurrent-ruby-1.1.10/lib/concurrent-ruby/concurrent/promises.rb:771:in `synchronize'
/Users/amirsharif/.rvm/gems/ruby-2.7.5/gems/concurrent-ruby-1.1.10/lib/concurrent-ruby/concurrent/promises.rb:771:in `wait_until_resolved'
/Users/amirsharif/.rvm/gems/ruby-2.7.5/gems/concurrent-ruby-1.1.10/lib/concurrent-ruby/concurrent/promises.rb:1256:in `wait_until_resolved!'
/Users/amirsharif/.rvm/gems/ruby-2.7.5/gems/concurrent-ruby-1.1.10/lib/concurrent-ruby/concurrent/promises.rb:1471:in `value!'
/Users/amirsharif/.rvm/gems/ruby-2.7.5/gems/puppeteer-ruby-0.40.7/lib/puppeteer/concurrent_ruby_utils.rb:54:in `await'
/Users/amirsharif/.rvm/gems/ruby-2.7.5/gems/puppeteer-ruby-0.40.7/lib/puppeteer/connection.rb:129:in `send_message'
/Users/amirsharif/.rvm/gems/ruby-2.7.5/gems/puppeteer-ruby-0.40.7/lib/puppeteer/browser.rb:25:in `create'
/Users/amirsharif/.rvm/gems/ruby-2.7.5/gems/puppeteer-ruby-0.40.7/lib/puppeteer/launcher/chrome.rb:201:in `connect'
/Users/amirsharif/.rvm/gems/ruby-2.7.5/gems/puppeteer-ruby-0.40.7/lib/puppeteer/puppeteer.rb:113:in `connect'
/Users/amirsharif/.rvm/gems/ruby-2.7.5/gems/puppeteer-ruby-0.40.7/lib/puppeteer.rb:82:in `public_send'
/Users/amirsharif/.rvm/gems/ruby-2.7.5/gems/puppeteer-ruby-0.40.7/lib/puppeteer.rb:82:in `block (3 levels) in <module:Puppeteer>'

You need to use Sidekiq in production mode if you want to get true concurrency.

Mike Perham
  • 21,300
  • 6
  • 59
  • 61