0

The issue: migration issues from RH5 (gcc 4.1.2 and GLIBC 2.5) to RH6 (gcc 4.4.7 and GLIBC 2.12)

Details: I am migrating a big project from RH5 (gcc 4.1.2) to RH6 (gcc 4.4.7) and everything compiles and links without any hiccups.

The code compiled on RH5 works well on both host machines (RH5 and RH6).

However, the code compiled on RH6 DOES NOT work on either.

There is no run-time error, simply one of many regression tests that I run FAILs.

Here is the complicating factor: My code creates a simulated machine (microprocessor) the software/firmware that runs the regression tests is a software that runs on this simulated machine and I am not debugging this firmware.

The bug is being injected when I compile my code in the RH6 host machine I was able to pinpoint the issue to two object files (.o) that are linked to create a Shared Object file (.so) which is part of my simulated machine.

if I compile the code on RH6 machine (the machine with gcc 4.4.7, where the executable fails during my regression tests) and then add the two offending (.o) files compiled on RH5 (gcc 4.1.2) and recompile the (.so) file (shared object) using these two .o files then everything works fine and my regression tests PASSes without any problems; by the way it passes on both RH5 and RH6 machines.

I am using "nm" and "objdump" trying to figure out what is/are the offending function/functions or library/libraries or whatever.

My questions: 1- How can I leverage these tools (nm and/or objdump) to really nail down the culprit?  (If I run the above commands on the .so files generated from RH5 and RH6 and compare the output files I can see big differences, but cannot tell what differences are expected due to the specifics of each compiler and its context and what could be a potential problem)

2- Is there any other tool out there that could really help me with investigating and solving this?

3- If I compare nm/objdump outputs from the .so file from RH6 and the .so file from RH6 with the two .o files from RH5, they look alike and I cannot see the end of the tangled line to try to untangle it, what would be the way to look at these output files and try to solve the issue?

I appreciate any comments, suggestions and contributions.

  • Not sure why you can't debug this normally. Work backwards from the detected problem. – Jester May 14 '15 at 21:43
  • Why don't you compile with -O0 -g3 to retain debugging symbols and disable optimizations, and then use gdb or some other debugger to debug the code at runtime? – lxe May 14 '15 at 21:51
  • Thanks for the feedback Jester and Ixe, Using the debugger and stepping through the code is an option that i would prefer exploring as the last resource as this is such a monster code. – Wlamir Mello May 14 '15 at 23:55
  • Additionally, by dis-assembling the .so file, I found that the .so file with the two modules (.o) compiled with RH5 has a function call (call) to a differently named function "__strtoul_internal@plt" whereas the .so file that FAILs has a call (call) to "strtoull@plt". – Wlamir Mello May 15 '15 at 00:07
  • The functions look like this: 0000000000007928 <__strtoul_internal@plt>: 7928: ff 25 ca 46 30 00 jmpq *0x3046ca(%rip) # 30bff8 <_GLOBAL_OFFSET_TABLE_+0x2a0> 792e: 68 51 00 00 00 pushq $0x51 7933: e9 d0 fa ff ff jmpq 7408 <_init+0x18> 0000000000008078 : 8078: ff 25 72 43 30 00 jmpq *0x304372(%rip) # 30c3f0 <_GLOBAL_OFFSET_TABLE_+0x698> 807e: 68 d0 00 00 00 pushq $0xd0 8083: e9 e0 f2 ff ff jmpq 7368 <_init+0x18> – Wlamir Mello May 15 '15 at 00:08
  • There is only one single call to "__strtoul_internal@plt" in the working code in a place in the code where the Failing code calls "strtoul@plt" – Wlamir Mello May 15 '15 at 00:09

0 Answers0