0

I had a program issue with the following stack.

6600:   ora_d006_LOOKUP
 ffffffff7addbbd0 __systemcall6 (3, ffffffff7d300440, 0, ffffffff7adc1268, d, fff7) + 24
 ffffffff7adcba74 pthread_sigmask (2000, 0, 0, 0, ffffffff7d300200, d) + 1c4
 00000001068ff3bc sslssalck (ffffffff7fffb138, 2, ffffffff7fffb070, 0, 3e8, 10c24d7e0) + 7c
 00000001069358e8 sltmarm (a00029810, 29810, 10c3f3ab0, 3f9, a00000000, 29810) + 88
 00000001069aa734 ltmdvp (8006689e, 3f9, 0, 10c55ba38, 10c3f8160, 10c3f34d0) + 154
 00000001068ff2a4 sslsstehdlr (e, 0, ffffffff7fffb570, 7fffff84, 10c3ed0d8, 10c24d7e0) + 224
 ffffffff7add7498 __sighndlr (e, 0, ffffffff7fffb570, 1068fcba0, 0, d) + c
 ffffffff7adcb02c call_user_handler (ffffffff7d300200, ffffffff7d300200, ffffffff7fffb570, c, 0, 0) + 3e0
 ffffffff7adcb238 sigacthandler (0, 0, ffffffff7fffb570, ffffffff7d300200, 0, ffffffff7af3e000) + 68
 --- called from signal handler with signal 0 (SIGEXIT) ---
 ffffffff7addad48 ioctl (10c3f80c0, bb8, 400, 10c426810, 10c6aae90, 2001420c) + c
 0000000109e47668 nteveque (10c40c940, bb8, ffffffff7fffca98, 1afbfb85a4, 1c, 98) + 28
 0000000109e3f0c0 ntevque (7, bb8, 10c2cbfd0, 10c40c940, ffffffff7fffca98, 10c2cbfd0) + 80
 0000000109d8e738 nsevwait (0, 0, 10c25cc00, 0, 10c25cc04, 10c3f7a60) + 1b8
 000000010092e7b4 ksnwait (10c25cc00, 6, 10c403fb0, 10c25c000, 10c25c, 10c000) + 54
 000000010072060c ksliwat (0, ffffffff7fffd8e8, 1770, 10c25b, 10c000, 0) + 140c
 0000000100704b28 kslwait (1770, ffffffff7fffd8e8, ffffffff7fffd8e8, ffffffff7fffd8e8, 0, 0) + e8
 00000001065707a0 kmdmai (1b1bfffe00, 10c2628e8, 1b02faf258, 10c26c190, 10c25b, 38000d000) + e40
 00000001063b0400 opirip (10a726000, 0, 380002, 380000, 38002a000, 38002a) + a80
 00000001035c59cc opidrv (32, 4, ffffffff7ffff590, 1ebb90, ffffffff7af45050, ffffffff7ffff9a0) + 30c
 000000010474117c sou2o (ffffffff7ffff568, 32, 4, ffffffff7ffff590, 10c000, 10b800) + 5c
 0000000100604f64 opimai_real (3, ffffffff7ffff838, ffffffff7ffffb60, ffffffff7ffffbb5, 0, 0) + 204
 0000000104757380 ssthrdmain (10c000, 3, 44dc00, 100604d60, 10c27c000, 10c27c) + 140
 0000000100604c74 main (3, ffffffff7ffff948, 0, ffffffff7ffff840, ffffffff7ffff950, ffffffff7d300200) + 134
 0000000100604b1c _start (0, 0, 0, 0, 0, 0) + 17c

this process is used to dispatch request from client. During the issue, no more request can be sent in and this process consumed many SYS cpu.

man ioctl, I will get the prototype of ioctl in system call. but I don't think it is same as the ioctl. The ioctl in the output of pstack should be a function in userland.

In the pstack:

--- called from signal handler with signal 0 (SIGEXIT) ---
ffffffff7addad48 ioctl (10c3f80c0, bb8, 400, 10c426810, 10c6aae90, 2001420c) + c

I wrote a small dtrace script.

pid$target::ioctl:entry
{
        printf("%s", probemod)
}

I get

3  82218                      ioctl:entry libc.so.1

so I think this ioctl came from libc.so.

But I can't get the manual for ioctl from libc.so.

1 where can I get the manual for ioctl in libc of solaris?

2 it is said that SIGEXIT is a pseudo signal. how to set up signal handle for this? how to sent SIGEXIT signal for a process? and at the last, we will have the following stack?

  ...  my_handle_signal .... 
  --- called from signal handler with signal 0 (SIGEXIT) ---
  ... xxxx
osgx
  • 90,338
  • 53
  • 357
  • 513
zhihuifan
  • 1,093
  • 2
  • 16
  • 30
  • What is the device used by program? ioctls are device- (driver-) specific. System-wide manual for ioctl is http://docs.oracle.com/cd/E23824_01/html/821-1463/ioctl-2.html – osgx Apr 21 '14 at 09:14
  • hi osgx. I truss the process. most of them are work on /devices/pseudo/poll@0:poll. truss -t ioctl -p xxx get the fd. and then pfiles {pid} get the filename(/devices/pseudo/poll@0:poll). – zhihuifan Apr 21 '14 at 09:31
  • 1
    If you installed the manual pages, then "man -s 2 ioctl" will display the man page on Solaris. Otherwise, the Solaris man pages are available on http://docs.oracle.com/, for instance [Solaris 11.1 man page for ioctl](http://docs.oracle.com/cd/E26502_01/html/E29032/ioctl-2.html). – alanc Apr 21 '14 at 18:28

1 Answers1

0

Your ioctl on /devices/pseudo/poll@0:poll device (or /dev/pool) seems to be handled by kernel function from common/io/devpoll.c file (online copy - http://fxr.watson.org/fxr/source/common/io/devpoll.c?v=OPENSOLARIS)

More exact, by the dpioctl function:

 692 dpioctl(dev_t dev, int cmd, intptr_t arg, int mode, cred_t *credp, int *rvalp)

zhihuifan, after checking your stacktrace I see that you program had executed:

main() -> ... nteveque() -> ioctl()

Then the signal hanlder was called.. I see no sending of signals from dpioctl, so I think the signal was send by some external function (or program or by user):

--- called from signal handler with signal 0 (SIGEXIT) ---

Then the user-space signal handler was called:

sigacthandler ->     call_user_handler ->     __sighndlr 
-> sslsstehdlr 

The sslsstehdlr did many actions, and according to my knowledge and POSIX standards ("2.4 Signal Concepts" from The Open Group Base Specifications Issue 6; IEEE Std 1003.1, 2004 Edition), the signal handler may only call (directly or indirectly) functions listed in table

The following table defines a set of functions that shall be either reentrant or non-interruptible by signals and shall be async-signal-safe. Therefore applications may invoke them, without restriction, from signal-catching functions:

... huge list but there is no ptherad_sigmask here...

All functions not in the above table are considered to be unsafe with respect to signals. .... when a signal interrupts an unsafe function and the signal-catching function calls an unsafe function, the behavior is undefined.

osgx
  • 90,338
  • 53
  • 357
  • 513
  • Thanks osgx. Seems the code in this website is for kernel, not for libc. I'm not clear about this relationship. actually, I think my program issue should have nothing with ioctl(I was wrong at the beginning). – zhihuifan Apr 22 '14 at 00:41
  • 1
    zhihuifan, yes because ioctl code in libc is very simple - take all arguments and pass them to the kernel. So ioctls are passed to kernel (solaris, linux, other unixes too), and the kernel will find the exact ioctl handler to work on the request according to filename. For you "/devices/pseudo/poll" (or for /dev/poll) the handler is `dpioctl`. – osgx Apr 22 '14 at 01:09
  • This answer is clear and easy to understand. Thanks. – zhihuifan Apr 22 '14 at 01:18
  • hi Osgx, ioctl problem is clear now as you said. My current problem is how SIGEXIT signal was set while there is no this signal in solaris at all? I'm sure that this is not set by human. and this stack if a stack of dispatcher process in oracle database 11g R2. this issue will happened after OS uptime > 248 days. Oracle support provide a patch, but it doesn't works. (after 248 day, it happen again). we have many databases and we don't want to reboot OS again and again which will cause many business impact. so I want to debug the problem with more information and report to oracle support – zhihuifan Apr 22 '14 at 01:40
  • I think you should work directly with oracle support. To get more details, you can try to find somebody skilled in solaris and oracle (not me). SIGEXIT can be generated by hand (`kill -0 $pid`) – osgx Apr 22 '14 at 01:43