3

Can somebody please summarize what the different members of the pthread_rwlock_t means?

    struct
 { 
   int __lock;
   unsigned int __nr_readers;
   unsigned int __readers_wakeup;
   unsigned int __writer_wakeup;
   unsigned int __nr_readers_queued;
   unsigned int __nr_writers_queued;
   int __writer;
   int __shared; 
   unsigned long int __pad1;
   unsigned long int __pad2;
   /* FLAGS must stay at this position in the structure to maintain
      binary compatibility.  */
   unsigned int __flags;
 } __data;

I am debugging one deadlock where the lock states looks like:

{__data = {
   __lock = 2,
   __nr_readers = 24644,
   __readers_wakeup = 28432136,
   __writer_wakeup = 24644,
   __nr_readers_queued = 0,
   __nr_writers_queued = 0,
   __writer = 0,
   __shared = 0,
   __pad1 = 0, __pad2 = 0,
   __flags = 0}, 
 __size = "\002\000\000\000D`\000\000\bױ\001D`", '\000' <repeats 41 times>,
 __align = 105845174042626}

And the thread is blocked while trying to acquire read lock on it. Is the lock structure looks sane?

The operating system is CentOS 7.6, with glibc-2.17-260.el7_6.3.x86_64.

Florian Weimer
  • 32,022
  • 3
  • 48
  • 92
ashish
  • 813
  • 3
  • 10
  • 18
  • You shouldn't be caring about the members (unless you're using it as an example of how to write your own rwlock). Just use it with the appropriate functions and treat it like a black box. – Shawn Apr 17 '19 at 12:50
  • @Shawn OP says they are trying to debug a deadlock; in that context, this is a legitimate question. – zwol Apr 17 '19 at 12:50
  • 1
    The `pthread_rwlock_t` *type* is standardized, but its members seem not to be. There's undoubtedly someone who knows what they mean, but most of us will only be able to guess based on their names, or would need to study the implementation. – John Bollinger Apr 17 '19 at 12:51
  • @zwol OP needs to post a [mcve] that demonstrates a deadlock to get help with it, then. – Shawn Apr 17 '19 at 12:54
  • I was about to suggest an MCVE myself, but primarily from the perspective of the exercise of creating one serving as a debugging methodology. Of course, once an MCVE is in hand, if the issue has not yet become clear to the OP then it will serve as a good basis for this or another SO question. – John Bollinger Apr 17 '19 at 12:56
  • @Shawn What if the members can give me a clue about what is going wrong? Actually I am worried about the case where there is corruption of the lock structure. I won't able to realise if its actually a corruption without having some ideas about the members of the structure. – ashish Apr 17 '19 at 13:02
  • @ashish John Bollinger has good advice -- in the process of attempting to cut your program down to the smallest possible test program that still reproduces the deadlock, you may realize what the problem is for yourself. I realise this may be anywhere from very difficult to totally impractical, depending on how big a program you're starting with and what it does, which is why I'm taking your question seriously -- but it's one that no one here may be able to answer, and in fact _no one at all_ may be able to answer anymore. – zwol Apr 17 '19 at 13:08
  • Note that there are only two ways to get a deadlock: lock escalation (for example, where you hold a read lock and try to get a write lock on the same object), and mixed locking order. If you **never** escalate a lock and **always** lock objects in the same order, you will **never** get a deadlock. – Andrew Henle Apr 17 '19 at 13:14
  • If I were to speculate based on member names, I would guess the lock data depict a state in which the read lock is held 24644 times (possibly by fewer distinct threads). The fact that the `__writer_wakeup` member has the same value could be taken (even more speculatively) as a sign of internal consistency. Why such a state would cause threads to block on acquiring the *read* lock is opaque to me, but if the read lock really is held that many times, then that in itself is pretty suspicious. Of course, if the read lock is held even once then that will block acquisition of the *write* lock. – John Bollinger Apr 17 '19 at 13:19

1 Answers1

0

Current versions of GNU libc (version 2.25 and later) ship gdb extensions that will decode the members of various pthread structures, including pthread_rwlock_t. However, looking at the code for this extension, it expects the contents of pthread_rwlock_t to be quite different from what you have shown, so manually applying it to your data dump will be of no use. For the same reason, I can't tell you what the fields mean.

If you tell us exactly which Linux distribution you are using, its age, and what the output of running /lib/libc.so.6 as if it were a program is (if that file doesn't exist, look for it in subdirectories of /lib and /lib64), we might be able to be more helpful.

It would also be worth attempting to move your program onto a newer Linux distribution and seeing if you can still reproduce the problem. Then you can use the gdb extensions yourself.

zwol
  • 135,547
  • 38
  • 252
  • 361
  • FWIW, the `pthread_lock_t` members the OP presents match the structure as defined in glibc 2.17 for x86_64 (and presumably some other versions in that vicinity). v2.17 is the version of glibc provided by the RHEL 7 flavor of distros, but also by various others, I'm sure. – John Bollinger Apr 17 '19 at 13:04
  • @zwol rpm -q glibc glibc-2.17-260.el7_6.3.x86_64 cat /etc/redhat-release CentOS Linux release 7.6.1810 (Core) – ashish Apr 17 '19 at 13:07
  • @JohnBollinger By any chance, would you be knowing what __lock = 2 represents? – ashish Apr 17 '19 at 13:08
  • 1
    No, @ashish, I am not among those with knowledge of the details of the glibc pthreads implementation. I just know how to find and read headers. – John Bollinger Apr 17 '19 at 13:13
  • @JohnBollinger yeah, I did try to read the header. https://code.woboq.org/userspace/glibc/nptl/pthread_rwlock_common.c.html#__pthread_rwlock_wrlock_full. "#2 0 0 >0 0 Readers have acquired the lock." But it seems this stored in __readers not in __lock. Anyways, will continue checking.. – ashish Apr 17 '19 at 13:17