0

I'm experimenting with FASM in order to improve my understanding of concurrency. I've created a program that has two threads each making some number of lock xadd. I run it on my Win7 64bit on i7 and I'm getting quite interesting results. While the program itself works correctly it loads 4 cores instead of two as I expected.

Task Manager's "Performance" shows clear load of 4 cores enter image description here

Resource Monitor's, CPU tab shows that my process has two threads

Could someone give a hint on why does that happen? Is there a way to tell which core is currently running a piece of code from my FASM program (just to make sure that the cores are indeed different)?

format PE console
include 'win32ax.inc'
include 'macro32.inc'
entry main

section '.code' code readable executable
main:
        invoke CreateThread, 0, 0, atomic_inc, 0, 0, 0
        mov [threadHandle], eax
        call atomic_inc
        invoke WaitForSingleObject, dword[threadHandle], 0xFFFFFFFF

        ;cinvoke printf, "B - %u", dword [myInt]

        invoke system, halt
        invoke exit, 0

        proc atomic_inc, lpParam
          mov ebx, 1000000000
          startloop:
            cmp ebx, 0
            jz endofloop
            push ebx

            ; Loop body
            mov eax, 1
            lock
            xadd dword [myInt], eax

            pop ebx
            dec ebx
            jmp startloop
          endofloop:
          ret
        endp


section '.data' data readable writable
        halt db "pause>null",0
        myInt dd 0
        threadHandle dd 0

section '.idata' import data readable
        library msvcrt, 'msvcrt.dll',\
        kernel, 'KERNEL32.DLL'

        import msvcrt,\
          system, 'system',\
          printf, 'printf',\
          exit, 'exit'

        import kernel,\
          CreateThread, 'CreateThread',\
          WaitForSingleObject, 'WaitForSingleObject',\
          Sleep, 'Sleep'
Juriy
  • 5,009
  • 8
  • 37
  • 52
  • 2
    The threads are switching between the cores. You can set the affinity of a thread if you want and choose the cores on which they execute. – harold Jun 17 '13 at 10:50
  • Sounds reasonable. But what is the motivation for switching cores? Wouldn't it cause a cache invalidation each time you switch to a different slot? I thought that you normally try to run critical code on a same core to keep caches hot... Or if I write some actual code that deals with mutable state hardware will choose a different strategy? – Juriy Jun 17 '13 at 11:14
  • 1
    Honestly I don't know. Windows seems to like to spread the load across the even cores (with HT disable it likes all cores). I would have used the "keep caches hot" as argument for keeping a thread on the same core as much as possible, I don't know why they don't do it that way. – harold Jun 17 '13 at 11:23
  • One possible reason could be to spread heat more evenly across the cpu die. – Jester Jun 17 '13 at 13:21
  • Also, the task manager itself, and other 'base-load' stuff on most boxes - network activity, AV apps etc. that use up like 1-2% of the box even when you are not 'doing anything' will require threads. Occasionally, (and I have no idea how often, but the 5 sec between TM refreshes is a long time in CPU terms), these kernel and user threads will likely use up all the cores briefly and preempt one of your threads. When the spike in base-load drops, your preeempted thread will likely be shovelled back onto the first core that becomes free. – Martin James Jun 18 '13 at 08:16
  • 1
    Oh - just checked - Task Manager itself has six threads and, presumably, they have a very high priority. The TM itself may displace all your threads when it runs. – Martin James Jun 18 '13 at 08:19
  • Thinking about it, if you were asked 'design a "task manager" to collect statistics, how would you do it? One way that occurs to me to get reasonably accurate stats is to run as many very-high-priority threads as there are cores to quickly take a snapshot of the stats before going back to sleep again. That would probably move around some of the other threads. – Martin James Jun 18 '13 at 08:33
  • What is a good way to trace which core is actually executing a thread at a given time? Just to make sure that those loaded cores are really busy incrementing counter. – Juriy Jun 18 '13 at 11:48
  • One other reason it might spread the load is to use less power. Running two CPUs full-blast can use more power than using 4 CPUs and sleeping them half the time. This is a common optimization, as laptops and smartphones become more prevalent, but you are correct that it will invalidate your caches. – Dan Jun 18 '13 at 22:36

0 Answers0