3

I have a very large, very old, very byzantine, very undocumented set of Fortran code that I am trying to troubleshoot. It is giving me divide-by-zero problems at run time due to a section that's roughly like this:

subroutine badsub(ainput)
      implicit double precision (a-h,o-z)
      include 'commonincludes2.h'
      x=dsqrt((r(6)-r(8))**2+(z(6)-z(8))**2)
      y=ainput
      w=y+x
      v=2./dlog(dsqrt(w/y))

This code hits divide by zero on the last line, because y is equal to w because x is zero, and thus dlog(dsqrt(1) is zero.

The include file looks something like this:

common /cblk/ r(12),z(12),otherstuff

There are actually 3 include headers with /cblk/ declaration which I've found from running grep -in "/cblk/" *.h *.f *.F: "commonincludes.h", "commonincludes2.h", and "commonincludes3.h". As an added bonus, the section of memory corresponding to r and z are named x and y in "commonincludes.h", i.e. "commonincludes'h" looks like:

common /cblk/ x(12),y(12),otherstuff

My problem is, I have NO IDEA where r and z are set. I've used grep to find everyplace where each of the headers are included, and I don't see anyplace where the variables are written into.

If I inspect the actual values in r and z in gdb where the error occurs the values look reasonable--they're non-zero, not-garbage-looking vectors of real numbers, it's just that r(6) equals r(8) and z(6) equals z(8) that's causing issue.

I need to find where z and r get written, but I can't find any instruction in the gdb documentation for attaching a watchpoint to COMMON block. How can I find where these are written to?

Frank
  • 544
  • 2
  • 14
  • 1
    Rather than putting a watch on the common block, can you put a watch on the entities in the common block/the local variables that you care about? – francescalus May 05 '22 at 17:13
  • If I try, it doesn't work, I think because the way the common blocks work. As I understand, they're literally a block of memory--I can call `common /cblk/ x(12),y(12)` in one subroutine and have `common /cblk/ z(24)` in another place and Fortran won't flinch at it as long as the block definitions are the same size. Like in my example, the same variable is named `r` and `x` depending on which header is used--how do I tell gdb to watch that? – Frank May 05 '22 at 20:00
  • 1
    Inside any one scope you can have no more than one version of that common block definition in play, so watch the corresponding local variable in that scope. That is, if you watch `r` in `badsub` you'll watch for changes to `r` through storage association, whatever the name of the other entity associated with that `r`. – francescalus May 05 '22 at 20:11
  • @francescalus a key part of my problem is I don't know what scope actually holds the code I need to monitor. Short of setting about 30 breakpoints for every subroutine that might be important and setting watch whenever they're entered (which isn't really practical with this spaghetti code) I don't know how to do it. Can you specify a scope when you set a watch? – Frank May 05 '22 at 22:58

1 Answers1

0

I think I have figured out how to do what I'm trying to do. Because COMMON variables are allocated statically, their addresses shouldn't change from run to run. Therefore, when my program stops due to my divide-by-zero error, I'm able to find the memory address of (in this example) r(8), which is global in scope and shouldn't change on subsequent runs. I can then re-run the code with a watchpoint on that address and it will flag when the value changes anywhere in the code.

In my example, the gdb session looks like this, with process names and directories filed off to protect the guilty:

Reading symbols from myprogram...
(gdb) r
Starting program: ************

Program received signal SIGFPE, Arithmetic exception.
0x00000000004df96d in badsub (ainput=1875.0000521766287) at badsub.f:109
109               v=2./dlog(dsqrt(w/y))
(gdb) p &r(8)
$1 = (PTR TO -> ( real(kind=8) )) 0xcbf7618 <cblk_+56>
(gdb) watch *(double precision *) 0x0cbf7618
Hardware watchpoint 1: *(double precision *) 0x0cbf7618
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: *************

Hardware watchpoint 1: *(double precision *) 0x0cbf7618

Old value = 0
New value = 6.123233995736766e-17
0x00007ffff6f2be2d in __memmove_avx_unaligned_erms () from /lib64/libc.so.6

I have confirmed from running a backtrace that this is indeed a place (presumably the first place) where my common block variable is being set.

Frank
  • 544
  • 2
  • 14