0

There was rather huge commit-git into nptl/glibc:

http://sourceware.org/git/?p=glibc.git;a=commit;h=e51deae7f6ba2e490d5faeb8fbf4eeb32ae8f1ee

by Ulrich Drepper and Jakub Jelinek @ 2007

I interested in the change to lll_lock/lll_unlock

In SMP code, lll_unlock was modified to

+# define __lll_unlock_asm "cmpl $0, %%gs:%P3\n\t"                            \
+                         "je 0f\n\t"                                         \
+                         "lock\n"                                            \
+                         "0:\tsubl $1,%0\n\t"

where $0 is the futex address Zero and %P3 is MULTIPLE_THREADS_OFFSET constant.

So, What is stored at $gs:MULTIPLE_THREADS_OFFSET (aka $gs:(offsetof (tcbhead_t, multiple_threads))? How this value is changed in the lifetime of program?

osgx
  • 90,338
  • 53
  • 357
  • 513
  • set by allocate_stack, nptl/allocatestack.c: [` 374 /* This is at least the second thread. */ 375 pd->header.multiple_threads = 1;`](http://fxr.watson.org/fxr/source/nptl/allocatestack.c?v=GLIBC27#L374) – osgx Nov 17 '11 at 04:35

1 Answers1

1

This jump is an optimization for the case where multi-threaded code is used in a single-threaded process. If you are using this code in a single-threaded process, then the 'lock' prefix to the subl instruction is not needed because atomically is not needed, and therefore can be eliminated in run-time. Instruction atomically incurs a run-time overhead at the CPU level.

So, the short answer is that multiple_threads field is a boolean that tells whether we are actually in a multi-threaded run-time environment.

Dan Aloni
  • 3,968
  • 22
  • 30
  • In this case the CPU overhead is negligible. FWIW, the two insns - the compare and the branch - incur far more CPU overhead. The problem with the `lock` insn is that it asserts the `LOCK#` signal on the memory bus and effectively blocks all other processors and form accessing the memory. Now, that's an overhead the scales with the machine :) – chill Nov 17 '11 at 10:55
  • chill, Is there a LOCK# on Core i7 (or modern Athlon) with integrated memory controller and QPI (or HT)? Dan, how this flag is set or changed? – osgx Nov 17 '11 at 11:29
  • chill, http://software.intel.com/en-us/blogs/2010/02/22/hardware-support-for-locks/ -> "atomic operations do not use system-wide locks for quite some time. Modern CPUs use technique called cache-locking, which does not have any system-wide impact and 100% scalable." – osgx Nov 17 '11 at 11:46