-1

I am using the pthreads library in NASM under Ubuntu 18.04. Thread creation works correctly, but I want to assign each thread to a separate core with pthread_setaffinity_np.

Following is the section of code I use to initialize threads. It compiles as written but at run time I get "undefined symbol: CPU_ZERO."

Using examples from C, I inserted %define _GNU_SOURCE at the top of the program, but I still get the undefined symbol CPU_ZERO error.

section .data align=16

; For thread scheduling:
cpuset: times 4 dq 0

section .text

label_0:

mov rdi,ThreadID            ; ThreadCount
mov rsi,pthread_attr_t  ; Thread Attributes
mov rdx,Test_fn         ; Function Pointer
mov rcx,pthread_arg
call pthread_create wrt ..plt

; Set affinity mask
mov rdi,cpuset
call CPU_ZERO wrt ..plt
call pthread_self wrt ..plt
push rax
mov rdi,rax
mov rsi,cpuset
call CPU_SET wrt ..plt
pop rax
mov rdi,rax
mov rsi,32
mov rdx,cpuset
call pthread_setaffinity_np wrt ..plt
; check the result with pthread_getaffinity_np

mov rax,[tcounter]
add rax,8
mov [tcounter],rax
mov rbx,[Number_Of_Cores]
cmp rax,rbx
jl label_0

My question is: how do I use CPU_ZERO and CPU_SET in NASM (or any other assembly language; I can translate to NASM).

Thanks for any help.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
RTC222
  • 2,025
  • 1
  • 20
  • 53
  • 1
    The upper case name is a hint, and the man page confirms it: `CPU_ZERO` and `CPU_SET` are macros. You can not call them like functions. You need to look in the headers how they are defined and implement them in assembly. – Jester Jan 07 '20 at 23:10
  • 1
    I assume the data structure is a bitmap of some size, just zero it with `movups` from a zeroed XMM register. – Peter Cordes Jan 07 '20 at 23:12
  • 1
    Since this only has to run once, you can probably just use memory-destination `bts` to implement `CPU_SET`. (It's slow with a register source, mostly fine with an immediate source. Either way it's only 1 instruction.) – Peter Cordes Jan 08 '20 at 00:15
  • Thanks for that idea @Peter Cordes. So far this is working out well by substituting my own code for the C macros, but the bitmask for cpuset is the only difficult part. I will now test it with an xmm register as you suggested, using movups and bts. When I'm finished I will post all of my code because this question has never been asked before (I searched thoroughly) and others may benefit from my experience. – RTC222 Jan 08 '20 at 00:24
  • My approach to this problem is summarized in my answer at https://stackoverflow.com/questions/59776532/reproduce-these-c-types-in-assembly/59870573#59870573. – RTC222 Jan 23 '20 at 01:48

3 Answers3

3

CPU_ZERO and CPU_SET are C macros, not functions which you can call.

You'll have to roll your own function to perform equivalent zeroing / setting.

Employed Russian
  • 199,314
  • 34
  • 295
  • 362
2

Those are CPP macros, not function. You can tell from the all-caps names. And from the fact the man page calls them macros.

As usual, the notes section of the man page has details that are useful for asm:

Since CPU sets are bit masks allocated in units of long words, the actual number of CPUs in a dynamically allocated CPU set will be rounded up to the next multiple of sizeof(unsigned long). An application should consider the contents of these extra bits to be undefined.

Notwithstanding the similarity in the names, note that the constant CPU_SETSIZE indicates the number of CPUs in the cpu_set_t data type (thus, it is effectively a count of the bits in the bit mask), while the setsize argument of the CPU_*_S() macros is a size in bytes.

On my system (Arch Linux, glibc 2.29-4)

/usr/include/bits/cpu-set.h says

...
#define __CPU_SETSIZE   1024
#define __NCPUBITS      (8 * sizeof (__cpu_mask))
...
typedef __CPU_MASK_TYPE __cpu_mask;  // ultimately unsigned long via some other headers
...
typedef struct
{
  __cpu_mask __bits[__CPU_SETSIZE / __NCPUBITS];
} cpu_set_t;

So a cpu_set_t is 1024 bits = 128 bytes = times 16 dq 0 or resq 16, at least on my system with that kernel config.


CPU_ZERO is free in your case; your statically-allocated cpu_set_t is statically zero-initialized. For some reason you put it in .data instead of .bss, so the executable will have to actually contain those zeros, but same difference.

If you did want to zero one on the stack, for example, rep stosd is one easy way, or xorps xmm0, xmm0 and 8x movups stores would also work.


Since high performance is not essential (CPU affinity-setting code probably only runs once), bts is a very convenient way to set bits in a bitmap (CPU_SET). With a memory destination, it takes a bit-index that can go outside the dword selected by the addressing mode. bts mem, reg is slow and microcoded (like 10 uops on Skylake), but nice for code size. bts mem, imm is only 3 uops, but or byte [mem + i/8], 1<<(i%8) is only 2 uops.

or also lets you set more than 1 bit at once, or more simply just mov store some bytes that contain the desired pattern of zeros and ones.

But TL:DR: it's just a bitmap, manipulate it however you like using asm, or even statically initialize it with non-zero values.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
0

My approach to solving this problem is summarized in my answer at Reproduce these C types in assembly?.

RTC222
  • 2,025
  • 1
  • 20
  • 53