We are encountering a randomly occurring segmentation fault on a C/C++ HPUX PA-RISC application RELEASE demo compiled with the HPUX PARISC compiler and linker ACC which loads a HPUX PA_RISC RELEASE shared object sl(i.e. so) compiled and linked with ACC. We do not have access to pmap or HPUX wdb. So we are using HP's proprietary debugger adb. Here is how we use use adb:
$ adb
PA-32 adb ($h help $q quiit)
adb>!cp mdMUReadWriteExample a.out
!
adb>:r
a.out: running (process 10947)
segmentation violation
stopped at 1E3C: STW r3,1416(r1)
At this point it appears the offending instruction is somehow related to the above assembly insruction. Our first question is whether 1416
is in decimal format or hexadecimal format.
Our second question is whether the program counter 1E3C is accurate and can be used to gain further information about the offending C/C++ source line of code/
Our third question is that supposing 1416
is in decimal format , then as shown below register 1($r1)
contains 0x40015b90
. Using hexadecimal arithmetic 1416
(base 10(i.e. hex 0x588
)) + 0x40015b90
equals 0x40016118
. Next , we use nm to find the shared object library address / C++ mangled symbol associated with 0x40016118
.
$ grep -n "4001611" /home/marc/acc3_pa_32bit/cameron_nm.txt
27808:40016118 ? static___soa_RSA_cpp_
27823:40016110 ? static___soa_cDateTime_cpp_
Next we modify our makefile to obtain the combined disassembly -- C++ source code. However, when we search all the 50 generated *.s files we cannot mysteriously find the static___soa_RSA_cpp_
. Have we skipped a crucial step here ?
adb>$r
pcoqh 0 1E3F
pcoqt 0 1E43
rp 0 0xC0209793
arg0 0 1 arg1 0 7F7F04FC arg2 0 7F7F050 4 arg3 0 7F7F0540
sp 0 7F7F05D0 ret0 0 0 ret1 0 1 dp 0 40016390
r1 0 40015B90 r3 0 7F7F0000 r4 0 4001591 8 r5 0 3C
r6 0 20 r7 0 3E r8 0 7F7F091 0 r9 0 40015918
r10 0 40031918 r11 0 1E800 r12 0 4001611 8 r13 0 400266A4
r14 0 3F r15 0 3F r16 0 3D r17 0 3D
r18 0 3A r19 0 7B03B764 r20 0 0xA98D4 00 r21 0 7F7F0550
r22 0 0 r31 0 1E2B sar 0 23 sr0 0 0xA98D400
sr1 0 3848400 sr2 0 0 sr3 0 0 sr4 0 0xA98D400
In summary, we are trying to determine if it is possible to find the offending C/C++ source lines which cause this random seg fault. Using Centos Linux and valgrind --tool=memcheck
we cannot find any buffer overruns. Thank you.