1

I am working on the signal handler to deal with reap signals, randomly I am getting the signal with offset when I call sigwaitinfo function. All the signal attributes are right, except for info.si_addr. This offset in info.si_addr is causing a segmentation fault.

This offset seems to be the same - I have tried removing the offset and that works, but I need a correct solution to go forward.

static void *signalHandler(void *vptr_args __attribute__((unused)))
  {
      sigset_t signal_set;
      siginfo_t info;

      sigemptyset(&signal_set);
      sigaddset(&signal_set, SIG_REAP);
      sigaddset(&signal_set, SIG_ISOC_CANCEL);
      sigaddset(&signal_set, SIGTERM);
      sigaddset(&signal_set, SIGPIPE);

      while (true) {
          int rc = sigwaitinfo(&signal_set, &info);
          //...
          if (rc > 0) 
{
            if(info.si_signo == SIG_REAP) 
               {
                 // Reap URBs after some simple checks
                 if ((info.si_code != SI_ASYNCIO) &&
                     (info.si_code != SI_KERNEL)) {
                      printf("Bad si_code %d in SIG_REAP", info.si_code);                      
                      continue;
                 } 
                   else {
                      printf("OK si_code %d in SIG_REAP", info.si_code);
                 }
                   struct usbdevfs_urb *ioctl_urb = (struct usbdevfs_urb*)info.si_addres
                  if (!ioctl_urb) {
                     printf("SIG_REAP gave NULL ioctl_urb");
                      continue;
                  }
                  UrbInfo *urbInfo = ioctl_urb->usercontext;
                  if (!urbInfo) {
                     printf("SIG_REAP gave NULL urbInfo");
                      continue;
}

scx
  • 45
  • 1
  • 8
  • 1
    Please show some code sample. It's hard to divine what's wrong with your code. – red0ct Jun 03 '19 at 10:29
  • @red0ct I have updated the code sample – scx Jun 04 '19 at 12:06
  • What are you doing with the `si_addr` member? [Per POSIX](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html), `si_addr` is only used by `SIGILL`, `SIGFPE`, `SIGSEGV`, and `SIGBUS`, and those signals would be delivered to the faulting thread, not to your `sigwaitinfo()` call. In other words, the value in `si_addr` is *meaningless* for the signals you appear to be catching. – Andrew Henle Jun 04 '19 at 14:37
  • I am using that signal address to reap URB's,I have attached continued part of code in my question – scx Jun 04 '19 at 18:43
  • I am getting general protection error actually in this line UrbInfo *urbInfo = ioctl_urb->usercontext; when I am deferencing ioctl_urb. The problem is this is happening randomly,other signals are fine. Even in this case except the signal info.si_addr all other parameters are correct.This is evident after removing the offset in info.si_addr (0XFFFE00000000),I am able to successfully reap the signal – scx Jun 04 '19 at 18:50
  • Where do `SIG_REAP` and `SIG_ISOC_CANCEL` come from? A Google search for "Linux SIG_REAP" returns no direct use of any signal called `SIG_REAP` and the **ONLY** result for `SIG_ISOC_CANCEL` is this question itself. Why do you think the address in `si_addr` is useful, given that the signal is not one of those listed by either POSIX or Linux documentation as having a useful `si_addr` value? – Andrew Henle Jun 04 '19 at 22:42
  • Hard to debug this without knowing what info comes with SIG_REAP, but maybe `ioctl_urb = (struct usbdevfs_urb*)info.si_addr` is an address from some process other than the receiver process. – Mark Plotnick Jun 04 '19 at 23:53
  • @AndrewHenle As I said in the worKing scenario I am getting reap signal with proper address and I am able toreap URB as well. In problematic scenario the problem is signal is reap its address is offset by 0XFFFE00000000 – scx Jun 05 '19 at 02:45
  • @MarkPlotnick submit_urb: Async fd:20 URB:102 i=0 ioctl_urb=0x12ab840 After sigwaitinfo info.si_addr=0xfffe012ab840 OK si_code -4 in SIG_REAP After ioctl urb =0xfffe012ab840 While submitting the address is proper,as you can see in first line,but the address returned from sigwaitinfo has offset – scx Jun 05 '19 at 02:49
  • @MarkPlotnick In the working scenario submit_urb: Async fd:20 URB:102 i=0 ioctl_urb=0x12ab840 After sigwaitinfo info.si_addr=0x12ab840 OK si_code -4 in SIG_REAP After ioctl urb =0x012ab840 – scx Jun 05 '19 at 02:52
  • *As I said in the worKing scenario I am getting reap signal* **What is "reap signal"?** It's not standard, it's not in any Linux documentation I can find, and it's not of the standard `SIGXXX` form found in POSIX and Linux systems. Without providing specifics of the non-standard `SIG_REAP`, this is not a complete example and doesn't meet the [requirements for a Stack Overflow question](https://stackoverflow.com/help/minimal-reproducible-example). Voting to close. – Andrew Henle Jun 05 '19 at 09:20
  • @AndrewHenle Sorry for mistake SIG_REAP in my case is SIGRTMIN #define SIG_REAP SIGRTMIN #define SIG_ISOC_CANCEL (SIGRTMIN + 1) – scx Jun 05 '19 at 11:41
  • OK, so the sender is sometimes overwriting the high 32 bits of si_addr with `0xfffe`. siginfo is mostly a big union. `0xfffe` is 65534, which is often the "nobody" uid/gid. Check whether your signal sending code is writing into, for example, the gid member of a struct {uid_t, gid_t} member of the siginfo union. – Mark Plotnick Jun 06 '19 at 21:11
  • @MarkPlotnick Thanks for understanding my problem,the problematic signal UID is 65534 – scx Jun 07 '19 at 04:26
  • @MarkPlotnick In the working scenario signal atrributes are: Signal code=-4 Error code=0,Process pid=36685936 User ID=0 Signal number=34 Signal band=36685936 Signal status=0 – scx Jun 07 '19 at 05:08
  • @MarkPlotnick In the non working scenario:Signal code=-4 Error code=0,Process pid=36685936 User ID=65534 Signal number=34 Signal band=281466423462000 Signal status=0 So this I am printing out immediate after sigwaitinfo call the User ID and signal band is changing – scx Jun 07 '19 at 05:10
  • Could you please edit your question to show the code that sends the signal? – Mark Plotnick Jun 07 '19 at 09:50
  • @MarkPlotnick I have uploaded the code that sends the signal – scx Jun 07 '19 at 13:00
  • @MarkPlotnick I have added code that sends signal as part of question – scx Jun 08 '19 at 14:01
  • @MarkPlotnick Do u need any more information from my side – scx Jun 12 '19 at 03:44
  • @srinicx I think there's enough info. I will have to look at kernel USB code that I am not familiar with. Might take a couple days. – Mark Plotnick Jun 12 '19 at 14:49
  • @MarkPlotnick Not a problem,today I checked 32 bit version of binary,there I am not finding offset,I am only finding it in 64 bit.Thanks for looking into – scx Jun 12 '19 at 17:14
  • @srinicx Can you tell me which version of the kernel you're running that has the offset problem? – Mark Plotnick Jun 12 '19 at 17:34
  • @MarkPlotnick 4.15.0-51-generic is the version.i think I am finding this issue in all Ubuntu 64 bit machines whichever I have checked – scx Jun 12 '19 at 18:44
  • @MarkPlotnick Anything else you need from me? – scx Jun 18 '19 at 03:50
  • @MarkPlotnick Three fields of a signal are getting offset by 0XFFFE0. They are:User ID SIgnal band and si call addr . Working:Signal code=-4 Error code=0,Process pid=31455232 User ID=0 Signal number=34 Signal band=31455232 Signal status=0 si int=0 si ptr=(nil) si fd=0 si lower=(nil) si upper=(nil) si call addr=0x1dff800 si syscall=0 . Non working:Signal code=-4 Error code=0,Process pid=31455232 User ID=65534 Signal number=34 Signal band=281466418231296 Signal status=0 si int=0 si ptr=(nil) si fd=0 si lower=(nil) si upper=(nil) si call addr=0xfffe01dff800 si syscall=0 – scx Jun 18 '19 at 07:07
  • @MarkPlotnick Thanks – scx Jun 20 '19 at 16:41
  • @MarkPlotnick If I set info.si_uid to 0 immediately after getting the signal ,all the parameters offset(0XFFFE) are removed – scx Jun 24 '19 at 06:39
  • Are you running this in a new user namespace, such as a container? If so, the culprit may be [userns_fixup_signal_uid](https://elixir.bootlin.com/linux/v4.15/source/kernel/signal.c#L974), called by (a few levels down the call stack) [async_completed](https://elixir.bootlin.com/linux/v4.15/source/drivers/usb/core/devio.c). The fixup function sets [si_uid](https://elixir.bootlin.com/linux/v4.15/source/include/uapi/asm-generic/siginfo.h#L129), overwriting part of the [si_addr](https://elixir.bootlin.com/linux/v4.15/source/include/uapi/asm-generic/siginfo.h#L139) field in the siginfo struct. – Mark Plotnick Jun 26 '19 at 19:14
  • @MarkPlotnick when I have only single USB device I don't get these offset but if there is any other USB traffic I meant using headphones in browser to listen to audio sometimes without headphone also I am getting this offset – scx Jun 26 '19 at 19:18
  • @MarkPlotnick Where can I actually check if it is using the new userspace,the next signals have proper user ID of 0 – scx Jun 27 '19 at 01:42
  • https://stackoverflow.com/questions/20010199/how-to-determine-if-a-process-runs-inside-lxc-docker may cover the most common cases – Mark Plotnick Jun 27 '19 at 01:49
  • When you see `User ID=65534`, how are you running your program? Are you doing `setuid(65534)` in your code or are you running a wrapper program that sets the uid to 65534 before running your program? – Mark Plotnick Jun 27 '19 at 01:55
  • @MarkPlotnick the issue is not happening in 32 bit,so is the issue with the kernel or my code? – scx Jun 27 '19 at 02:43
  • @MarkPlotnick there is a wrapper program which has the uid nobody but if I set the user ID to something else then also there is same offset of 65534 – scx Jun 27 '19 at 02:47
  • Is the wrapper program source code short enough to edit into your question, or can you give a link to it? – Mark Plotnick Jun 27 '19 at 09:52
  • It works in 32-bit because the si_uid element in the big union in the siginfo struct is at bytes 4-7. On a 32-bit process, this won't clobber any part of si_addr. On a 64-bit process, it will clobber the high 4 bytes. – Mark Plotnick Jun 27 '19 at 10:03
  • @MarkPlotnick Thanks for your analysis. I checked today whether we are explicitly specifying User ID,but no luck – scx Jun 27 '19 at 14:52
  • @MarkPlotnick Can I change user ID 65534 to 0,will this be applicable in all linux distributions like Centos,Fedora.It is working in Ubuntu – scx Jun 27 '19 at 15:44
  • Changing the uid to 0 will help a little - the kernel is still clobbering the high 32 bits of a 64-bit address - but it won't prevent the underlying problem. A better workaround is to check that the address you pass to the kernel is all zero in the upper 32 bits, and to refuse to continue if that's the case, and then zero out the upper 32 bits in the address sent by the signal. That's still just a workaround. – Mark Plotnick Jun 27 '19 at 15:52
  • Have you tried specifying a signal such as SIGTRAP, as in Andrew's answer, instead of a RT signal? – Mark Plotnick Jun 27 '19 at 15:53
  • @MarkPlotnick I think while submitting to kernel address is fine, I haven't tried setting SIGTRAP – scx Jun 27 '19 at 16:07
  • @MarkPlotnick Would the offset remain same in other distributions? – scx Jun 27 '19 at 16:39
  • It might be present or not present based on kernel version. I haven't looked at anything except the version you run (4.15). – Mark Plotnick Jun 27 '19 at 17:05

1 Answers1

2

You are misusing si_addr. It is available for only a limited number of signals, and those do not include any real-time signals.

Per POSIX, si_addr is not applicable for signals other than SIGILL, SIGFPE, SIGSEGV, and SIGBUS. Linux also provides si_addr data for SIGTRAP:

SIGILL, SIGFPE, SIGSEGV, SIGBUS, and SIGTRAP fill in si_addr with the address of the fault.

No other signals provide a value for si_addr.

The source code linux/kernel/signal.c that fills in si_addr clearly shows that si_addr is not used for any signals other than those listed.

Note that per the Linux signal(7) man page:

Real-time signals are distinguished by the following:

  1. Multiple instances of real-time signals can be queued. By contrast, if multiple instances of a standard signal are delivered while that signal is currently blocked, then only one instance is queued.

  2. If the signal is sent using sigqueue(3), an accompanying value (either an integer or a pointer) can be sent with the signal. If the receiving process establishes a handler for this signal using the SA_SIGINFO flag to sigaction(2), then it can obtain this data via the si_value field of the siginfo_t structure passed as the second argument to the handler. Furthermore, the si_pid and si_uid fields of this structure can be used to obtain the PID and real user ID of the process sending the signal.

...

Andrew Henle
  • 32,625
  • 3
  • 24
  • 56