3

I am looking for a way in Delphi to get the smallest single and double floating point value that I can add to or subtract from or add to my number to make the number different for floating point comparisons. Alternatively, if I can get the next floating point number that is smaller and larger than my number. From a floating point standpoint I would like to convert this:

if (A >= B) or (C <= D) then

To

if (A > newnumber1) or (C < newnumber2) then

Where they produce the same results in floating point. newnumber1 and newnumber2 would obviously be different for single and doubles. I either need some value that I can subtract from my A and add to my C values to get the newnumber1 and newnumber2 or I need a way of getting to these numbers from B and D.

In C++11 there is a method std::nextafter that is referenced in this question that looks like it would be sufficient.

Finding the closest floating point value less than a specific integer value in C++?

Context

I am doing vector operations and I need to do the equivalent of a greater than or equal to. The easiest way to accomplish this is to take a slightly smaller number and use that with a greater than operation. I would prefer not to thumb suck a value that seems to work, if at all possible.

The vector operation that I am using is ippsThreshold_LTValGTVal_32s from:

https://software.intel.com/en-us/node/502143

The library obviously doesn't support a >= operation. That is not practical in a floating point sense. To to create an equivalent function I need to increase and decrease my comparison values to counter this and then use a greater than operation and a less than operation.

For Example

If I have an array of 5 values [99.4, 20, 19.9, 99, 80], the ippsThreshold_LTValGTVal_32s vector operation will let me replace specific values in the vector with my own replacement values. In this example, I would like to replace all values >= 99 and <= 20 with 0. To do this I would like to pass in something like this. So I have to replace the 99 with something marginally smaller and the 20 with something marginally bigger.

The function signature looks like this:

ippsThreshold_LTValGTVal_32s(..., ..., ..., levelLT, valueLT, levelGT, valueGT);

My call would be something like this:

ippsThreshold_LTValGTVal_32s(..., ..., ..., 20.00000001, 0, 98.99999, 0);

This would then include the 20 for the less than operation and the 99 for the greater than operation and give me a vector that looks like [0, 0, 0, 0, 80].

I need to find out what to use for the 20.0000001 and 98.999999. I would like to have the difference between these values and the original values be as small as possible while still being significant enough to include the values in the > and < operations.

Community
  • 1
  • 1
Graymatter
  • 6,529
  • 2
  • 30
  • 50
  • I don't think you've analysed this correctly. Why do you want to allow the next closest value, but disallow the one after that? What's the numerical reasoning that led you to that conclusion? – David Heffernan Mar 13 '15 at 18:24
  • @DavidHeffernan Its the other way around. If I have a value 5. I need this value included along with all values greater than 5. The vector operations don't support a ">= 5" so I need to use "> X" where X is the largest value that will include 5 in a greater than operation. That means X needs to be marginally smaller than 5. Here is the operation that I am using (https://software.intel.com/en-us/node/502143) – Graymatter Mar 13 '15 at 18:35
  • I think that information would be useful in the question. And then you could ask what you really want to know which is how to implement greater than or equal to on top of this library. That said I cannot understand what your operation is. So far as I can tell > and >= are the same in this context. – David Heffernan Mar 13 '15 at 18:42
  • @DavidHeffernan Thanks, I have updated the question with some more details. – Graymatter Mar 13 '15 at 18:53
  • Ups. My apologies. I understood something else, and I thought that was sought to find the value of the machine epsilon. My post does not make sense. I will erase. – Marcelo Cuadrado Mar 13 '15 at 18:59
  • Ok, so it's not clipping then. I see. – David Heffernan Mar 13 '15 at 19:24
  • @DavidHeffernan It wasn't very clear initially. I hope the edits have helped to clear up the question. – Graymatter Mar 13 '15 at 19:36
  • Yes, your edits are great. I understand now. Thanks. – David Heffernan Mar 13 '15 at 19:45
  • Wow, someone gave a down vote on the question. That's rather strange. Care to give a reason for the down vote? – Graymatter Mar 14 '15 at 18:20

1 Answers1

6

By design, for IEEE754 data types, you can simply treat the value as an integer and increment the value. Or decrement it if the value is negative.

function NextDoubleGreater(const D: Double): Double;
var
  SpecialType: TFloatSpecial;
  I: Int64;
begin
  SpecialType := D.SpecialType;
  case SpecialType of
  fsZero,fsNZero:
    // special handling needed around 0 and -0
    I := 1;
  fsInf, fsNInf, fsNaN:
    I := PInt64(@D)^; // return the original value
  fsDenormal, fsNDenormal, fsPositive, fsNegative:
    begin
      I := PInt64(@D)^;
      if I >= 0 then begin
        inc(I);
      end else begin
        dec(I);
      end;
    end;
  end;
  Result := PDouble(@I)^;
end;

And similarly in the opposite direction:

function NextDoubleLess(const D: Double): Double;
var
  SpecialType: TFloatSpecial;
  I: Int64;
begin
  SpecialType := D.SpecialType;
  case SpecialType of
  fsZero,fsNZero:
    // special handling needed around 0 and -0
    I := $8000000000000001;
  fsInf, fsNInf, fsNaN:
    I := PInt64(@D)^; // return the original value
  fsDenormal, fsNDenormal, fsPositive, fsNegative:
    begin
      I := PInt64(@D)^;
      if I >= 0 then begin
        dec(I);
      end else begin
        inc(I);
      end;
    end;
  end;
  Result := PDouble(@I)^;
end;

It's no coincidence that the format is this way. Implementation of floating point comparison operators is trivial because of this design.

Reference: How to alter a float by its smallest increment (or close to it)?

Community
  • 1
  • 1
David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • Or with [TDoubleRec.BuildUp](http://docwiki.embarcadero.com/Libraries/en/System.TDoubleRec) or `TDoubleHelper.BuildUp`: `var LValA, LValB: Double; LValB.BuildUp( LValA.Sign, LValA.Mantissa+1,LValA.Exponent );` – Sir Rufo Mar 13 '15 at 20:22
  • @Sir Then you have to deal with mantissa overflow. Easy enough, but less efficient. – David Heffernan Mar 13 '15 at 20:31
  • That's perfect. Tested and working correctly. NaN and INF are not a problem in my case but even if they were, it would make sense in that case to check for NaN/INF and just return the original value because there is no previous and next value for them. – Graymatter Mar 13 '15 at 20:34
  • 1
    Yes, I didn't want to cover every edge case, because that's relatively easy, if mundane. The -0 case is the tricky one. FWIW, nobody at the linked question handles that. I spotted that one myself. Thanks for the excellent question. I've learnt something tonight. – David Heffernan Mar 13 '15 at 20:35
  • Just a hint: `var I: Int64 absolute Result;` and then `Result := D;` allows to simplify the code (both Int64 and Double have size 8 bytes). – Abelisto Mar 14 '15 at 00:09
  • @Abelisto I considered that. I'm not the biggest fan of absolute. Personal choice. – David Heffernan Mar 14 '15 at 07:48
  • 1
    @DavidHeffernan **Offtopic** It was very useful at DOS era to direct access for example to Hercules/CGA/EGA/VGA video memory :o) – Abelisto Mar 14 '15 at 10:00
  • Just a note which I just picked up. The code above (we had a different variant of it) doesn't handle the case in NextDoubleGreater when you switch from the largest negative number ($8000000000000001) back to 0. – Graymatter Mar 10 '17 at 21:05
  • If you use NextDoubleLess on 0 then the result is $8000000000000001 (in hex). Using NextDoubleGreater on $8000000000000001 results in -0 instead of 0. – Graymatter Mar 10 '17 at 21:11
  • What's wrong with that? Seems like a reasonable design choice. If you want to make a different choice do so. – David Heffernan Mar 10 '17 at 21:12
  • Shouldn't NextDoubleGreater(NextDoubleLess(x)) always return the same value? I suppose, the result will be wrong with -0 then :( – Graymatter Mar 10 '17 at 21:13
  • You can't achieve that because -0=0. My design choice is symmetric. Yours favours positive zero. Anyway it's up to you what you write in your program. – David Heffernan Mar 10 '17 at 21:16
  • To elaborate. You cannot achieve that property because you also use have NDG(d) > d – David Heffernan Mar 10 '17 at 21:20