3

I have a simple piece of code that extracts a float from a FORTRAN-generated REAL array, and then inserts it into a stream for logging. Although this works for the first 30 cases, on the 31st it crashes with a "Floating-point invalid operation".

The code is:

int FunctionDeclaration(float* mrSwap)
{
...
float swap_float;
stringstream message_stream;
...
swap_float = *(mrSwap+30-1);
...
message_stream.clear();
message_stream <<  30 << "\t" << swap_float << "\tblah blah blah \t";

When debugging, the value of swap_float the instance before the crash (on the last line, above) is 1711696.3 - other than this being much larger than most of the values up until this point, there is nothing particularly special about it.

I have also tried replacing message_stream with cerr, and got the same problem. I had hitherto believed cerr to be pretty much indestructable - how can a simple float destroy it?

Edit:

Thanks for the comments: I've added the declaration of mrSwap. mrSwap is approximately 200 long, so I'm a long way off the end. It is populated outside of my control, and individual entries may not be populated - but to the best of my understanding, this would just mean that swap_float would be set to a random float?

Rainald62
  • 706
  • 12
  • 19
Mike Sadler
  • 1,750
  • 1
  • 20
  • 37
  • Can you make this a small compilable example? What is `mrSwap` and how is it populated? – hmjd May 09 '12 at 11:23
  • 1
    Sounds like you've overrun the end. – Flexo May 09 '12 at 11:24
  • If that's a pointer operation, I'd wonder if you just ran past the end of the array. There's no issue with cerr, I'm sure. – duffymo May 09 '12 at 11:24
  • You refer to `mrSwap` but don't show what it is. – Sebastian Mach May 09 '12 at 11:29
  • Have you tried using `printf()` instead of `cout`? – zvrba May 09 '12 at 11:38
  • I think that's my next step - after lunch, at least. At least printf/sprintf declare the types they are expecting, so might give more meaningful error messages... – Mike Sadler May 09 '12 at 11:44
  • @MikeSadler Trying to read a float value which hasn't been initialized is undefined behavior. Your "random float" may in fact be a trapping representation. (According to the standard, this is true for any non-character type. In practice, however, I only know of one machine where it might be a problem with `int`, where as it is a problem with `float` on some of the more common platforms: Intel, Sparc...) – James Kanze May 09 '12 at 12:29
  • According to my colleague, this particular value *shouldn't* be undefined, but as this particular piece of code is to double-check the contents of the array, I would like it to cope with (i.e. preferably recognise) undefined cells. – Mike Sadler May 09 '12 at 13:44

2 Answers2

3

individual entries may not be populated - but to the best of my understanding, this would just mean that swap_float would be set to a random float?

Emphatically not. Certain bit patterns in an IEEE floating-point number indicate an invalid number -- for instance, the result of an overflowing arithmetic operation, or an invalid one (such as 0.0/0.0). The puzzling thing here is that the debugger apparently accepts the number as valid, while cout doesn't.

Try getting the bit layout of swap_float. On a 32-bit system:

int i = *(int*)&swap_float;

Then print i in hexadecimal, and let us know what you see.

Updated to add: From Mike's comment, i=1238430338, which is 49D0F282 in hex. This is a valid floating-point number, equal to exactly 1711696.25. So I don't know what's going on, I'm afraid. The only thing I can suggest is that maybe the compiler is loading the invalid floating-point number directly from the mrSwap array into the floating-point register bank, without going through swapFloat. So the true value of swapFloat is simply not available to the debugger. To check this, try

int j = *(int*)(mrSwap+30-1);

and tell us what you see.

Updated again to add: Another possibility is a delayed floating-point trap. The floating-point co-processor (built into the CPU these days) generates a floating-point interrupt because of some illegal operation, but the interrupt doesn't get noticed until the next floating-point operation is attempted. So this crash might be a result of the previous floating-point operation, which could be anywhere. Good luck with that...

TonyK
  • 16,761
  • 4
  • 37
  • 72
  • Thanks Tony - worth knowing about the floats. When I do as you suggest, i=1238430338 in the debugger and when sent to cerr. I'll have a go printing it in hexadecimal... – Mike Sadler May 09 '12 at 13:36
  • OK, in hexadecimal, i=49d0f282. I also tried using printf("%f", swap_float), and that bombs out with exactly the same error. – Mike Sadler May 09 '12 at 13:42
  • Following the first edit, j=49d0f282 - so the same as before. On the second edit: this is sounding unpleasantly plausible, as the debugger seems to end up in a 'confused' state - the stack's second entry is: "[Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll]". Unfortunately, I am not that experienced using debuggers, so I may not be getting all the clues... – Mike Sadler May 09 '12 at 14:11
  • I do, however, know what the previous floating point operation was. The last set of '...' in my snippet is actually a call to this function: bool IsSame(float swap_value, double function_value, double tolerance) { return (abs(swap_value - (float)function_value) < tolerance); } The compiler warns me of the loss of precision, but it seems to run happily and return the correct result, so I hadn't bothered mentioning it. – Mike Sadler May 09 '12 at 14:15
  • GOT IT! You were absolutely correct, TonyK - in my comparison using IsSame, the other value was NaN (this is a valid value in this context), and although it happily subtracted it from swap_float, it put a flag in saying to report the *next* operation as an error. I have to say that I was completely unaware that that was possible - I thought that if it worked, it worked. – Mike Sadler May 09 '12 at 14:22
3

I'm just adding this answer to highlight the correct solution within TonyK's answer above - because we did a few loops, the answer has been edited, and because several salient points are within the comments, the actual answer may not be immediately apparent. All credit should go to TonyK for the solution.

"Another possibility is a delayed floating-point trap. The floating-point co-processor (built into the CPU these days) generates a floating-point interrupt because of some illegal operation, but the interrupt doesn't get noticed until the next floating-point operation is attempted. So this crash might be a result of the previous floating-point operation, which could be anywhere." - TonyK

This was indeed the problem: in my comparison using IsSame, the other value was NaN (this is a valid value in this context), and although it happily subtracted it from swap_float, it put a flag in saying to report the next operation as an error. I have to say that I was completely unaware that that was possible - I thought that if it worked, it worked.

Mike Sadler
  • 1,750
  • 1
  • 20
  • 37