1

I have an extremely simple program that does nothing more than call recvfrom() in a loop. According to its manpage, one of the arguments is a pointer to the length of the address. This address is initialized in the .data section to the integer value 16. I noticed some strange behavior when I attach to the already-running process to trace it which is not present when I trace the process directly (when I start it traced). Scroll to the end of the lines:

# strace -x -s 10 -e trace=recvfrom ./test
recvfrom(3, "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"..., 32, 0, {sa_family=AF_INET, sin_port=htons(42134), sin_addr=inet_addr("127.0.0.1")}, [16]) = 32
recvfrom(3, "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"..., 32, 0, {sa_family=AF_INET, sin_port=htons(49442), sin_addr=inet_addr("127.0.0.1")}, [16]) = 32
recvfrom(3, ^Cstrace: Process 18909 detached
 <detached ...>
# ./test &
# strace -x -s 10 -e trace=recvfrom -p $!
strace: Process 18916 attached
recvfrom(3, "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"..., 32, 0, {sa_family=AF_INET, sin_port=htons(50906), sin_addr=inet_addr("127.0.0.1")}, [1999040176->16]) = 32
recvfrom(3, "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"..., 32, 0, {sa_family=AF_INET, sin_port=htons(52956), sin_addr=inet_addr("127.0.0.1")}, [16]) = 32
recvfrom(3, ^Cstrace: Process 18916 detached
 <detached ...>

When I trace it directly, the address length argument shows as [16], which makes sense. After all, the address is a pointer to an int of the value 16. However, when I attach to the process and trace it, the very first call shows that it is not initialized, e.g. [1999040176->16]. This happens for the first syscall every time I attach, but all subsequent calls it shows it correctly as [16]. If I detach from the process and re-attach, the first call will show it as having uninitialized memory.

To be brief:

  • When I run it under strace, the last argument shows [16] for every recvfrom().

  • When I attach to it when it is already running, the last argument shows things like [1999040176->16] in the first call to recvfrom(), and [16] in all subsequent ones.

  • If I detach from it and attach again, the first call to recvfrom() again displays this odd behavior, and all subsequent calls display the expected [16].

The program itself is correct. Here is the program (written in MIPS assembly):

.section .text
        .global __start

__start:
        # socket
        li      $v0,4183
        li      $a0,2
        li      $a1,1
        li      $a2,0
        syscall
        sw      $v0,sockfd

        # bind
        li      $v0,4169
        lw      $a0,sockfd
        la      $a1,sockaddr_b
        li      $a2,16
        syscall

loop:
        # recvfrom
        li      $v0,4176
        lw      $a0,sockfd
        la      $a1,buffer
        li      $a2,32
        li      $a3,0
        la      $t0,sockaddr_a
        sw      $t0,16($sp)
        la      $t0,addrlen
        sw      $t0,20($sp)
        syscall

        j       loop

.section .bss
sockaddr_a:     .space  16
buffer:         .space  32
sockfd:         .space  4

.section .data
addrlen:        .int    16

.section .rodata
sockaddr_b:     .hword  2,1234,0,0
forest
  • 206
  • 4
  • 15
  • I'm not sure what the issue is here, `1999040176` seems to be the pointer value pointing to the value 16. This just seems like a different way to output the value – PeterT May 12 '18 at 17:38
  • Maybe it's just some weird formatting under certain conditions. Does using `-e raw=recvfrom` always print the same value? – PeterT May 12 '18 at 17:52
  • The Linux MIPS syscall ABI passes the 5th and 6th args in memory on the user-space stack? That seems like a poor design vs. using `$t0` and `$t1` or something. I'd guess that makes it harder for `strace` to get the correct value, so maybe there's a glitch reading it when attaching during a `recvfrom` system call? Your process spends almost all of its time in kernel space, so you will fairly consistently attach while it's in `recvfrom`. – Peter Cordes May 12 '18 at 18:59
  • And BTW, you don't need static storage for `sockfd`, just keep it in a register. On most ISAs, Linux system calls don't clobber registers other than the return value, so you can probably hoist the setting of `$a0..3` out of the loop. (And if you wanted to, calculate one address as an offset from another, instead of using independent `la` pseudo-instructions that assemble to 2 instructions each). – Peter Cordes May 12 '18 at 19:01
  • @PeterCordes I know. This is just a stripped down version of a much more complicated program. I removed everything that wasn't contributing to this behavior and was left with this. – forest May 12 '18 at 21:53
  • 1
    Ok, that's fair; using `la` inside the loop does make for a clearer example which is what I hoped was your reason for keeping it in. Anyway, looks like a MIPS `strace` glitch / bug. – Peter Cordes May 12 '18 at 21:56
  • Yes. The real main loop is not nearly as tight. It was a bug that certainly caused my quite a bit of confusion because it made me think I had a bug in my program... Also `a0...3` can be clobbered by a function so they're not useful for storing things. It'd be better to use `s0`, etc. for that. – forest May 12 '18 at 22:02
  • 1
    @PeterCordes Yes, `-e raw=recvfrom` always prints the same value (which is not `1999040176`). The value is, naturally, the location of `addrlen`. – forest May 12 '18 at 22:10
  • Hmm, is it possible that something else gets stored at that memory location temporarily, and that's what `strace` is seeing when it looks part way through the execution of the system call? That might be possible inside the kernel, actually, if it uses that output value as a scratch. But that's unlikely if it's only read / written via wrapper functions like `copy_to_user`, unless that becomes transparent / a no-op on MIPS or other ISAs where the kernel still has user-space memory mapped. – Peter Cordes May 12 '18 at 22:22
  • @PeterCordes The address is given to [`move_addr_to_user()`](https://elixir.bootlin.com/linux/v4.16.8/source/net/socket.c#L210), which does nothing more than get the length with `get_user()` and use it as the size argument for `copy_to_user()` in order to copy the address itself. – forest May 12 '18 at 22:49
  • `After copying the data up to the limit the user specifies, the true length of the data is written over the length limit the user specified. Zero is returned for a success.` is the only thing that looks relevant, but of course it's not going to change its behavior based on when I trace the process... – forest May 12 '18 at 23:12
  • 3
    I briefly looked at the strace source code. It prints %d->%d if the sockaddr length at entry to and exit from the system call differs. When you strace an existing process, it's likely that you're attaching to it while it's in the middle of a system call, so strace misses the entry to that system call, and maybe the variable where strace stores the length-at-entry never gets set. – Mark Plotnick May 22 '18 at 21:18
  • @MarkPlotnick That sounds reasonable. If you post that as an answer, I'll accept it. – forest May 05 '21 at 00:34
  • 1
    @forest OK. First, I'd like to check the current source code to see if this bug is still there. – Mark Plotnick May 18 '21 at 14:06

0 Answers0