Delphi: How do I translate this C code, which performs low-level access to IEEE floating-point number?

Question

The following C function is from fastapprox project.

static inline float 
fasterlog2 (float x)
{
  union { float f; uint32_t i; } vx = { x };
  float y = vx.i;
  y *= 1.1920928955078125e-7f;
  return y - 126.94269504f;
}

I know that C union can be translated to Delphi variant record, but I still experienced difficulty in translating such low-level C code to Delphi. I hope Delphi experts here are willing to help.

More Information

I add this section later, which is not a part of the question. This section gives information to readers that expect better accuracy.

In fastapprox, fasterlog2() was deliberately designed to be simpler, faster but less accurate Log2 function. Anyone who expect better accuracy can use the more-accurate function they provide, namely fastlog2().
They included a Mathematica notebook with an explanation of their algorithms as well as the origin of some mysterious values, e.g.126.94269504. Mathematica website provides a free viewer for the .nb files.
See also: Why the IEEE-754 exponent bias used in this C code is 126.94269504 instead of 127?

score 8 · Accepted Answer · edited Jun 26 '15 at 15:27

8

I think I'd code it by using pointer casts to effect a reinterpret cast:

function fasterlog2(x: single): single;
const
  c1: Single = 1.1920928955078125e-7;
  c2: Single = 126.94269504;
var
  y: single;
begin
  y := PCardinal(@x)^;
  Result := y * c1 - c2;
end;

Note that I used typed constants of type single to ensure an exact match with the C code.

I don't really see any need for a variant record in a Delphi implementation.

Or you could use a pure asm approach. The x86 version looks like this:

function fasterlog2asm(x: single): single;
const
  c1: Single = 1.1920928955078125e-7;
  c2: Single = 126.94269504;
asm
  FILD    DWORD PTR [ESP+$08]
  FMUL    c1
  FSUB    c2
  FWAIT
end;

For x64 the SSE implementation would be

function fasterlog2asm64(x: single): single;
const
  c1: double = 1.1920928955078125e-7;
  c2: double = 126.94269504;
asm
  CVTDQ2PD  xmm0, xmm0
  MULSD     xmm0, c1
  SUBSD     xmm0, c2
  CVTSD2SS  xmm0, xmm0
end;

In x64 the assembly version is only about twice as performant as the pure pascal function. The x86 assembly version is over five times as performant - this is entirely due to the higher cost of type conversion (integer/single/double) in SSE vs x87.

The reason that this approach can be used is that floating point numbers are represent as

significand * base^exponent

and a value of 2 is used as the base.

edited Jun 26 '15 at 15:27

J...

30,968
6
66
143

answered Jun 26 '15 at 07:55

David Heffernan

601,492
42
1,072
1,490

@David_Heffernan, actually your pure Pascal approach is already more than enough. Thank's so much for providing the Assembly approach. – Astaroth Jun 26 '15 at 08:58
@Astaroth The code emitted by the compiler for all of the non-asm versions is pretty laboured. The code for the asm version is very clean. The Pascal versions all involve copying the parameter to a local. The asm version loads it into the CPU directly with no copying. – David Heffernan Jun 26 '15 at 09:12
I've added an SSE implementation for x64 with notes. I've declared the two constants as doubles simply to reduce the number of conversions. The intermediate calculation is done with DP so it made sense to do it this way. – J... Jun 26 '15 at 15:11
@J... Thank you very much! – David Heffernan Jun 26 '15 at 15:12
1

Some more timing information : x86/PurePascal `1441ms`, x86/asm `170ms`, x64/PurePascal `436ms`, x64/asm `250ms`. Assembly is faster in both cases, but the stalwart x87 is the clear winner here. The x64 compiler seems to generate better assembly than the x86 compiler for the pure pascal versions but under optimization the extra overhead of SSE type conversions limits the ultimately achievable performance. – J... Jun 26 '15 at 17:38
the test...for interest `var i : integer; a, d : single; s : TStopwatch; sa : array[1..99999999] of single; begin for i := 1 to 99999999 do begin sa[i] := i * 0.0001; end; s := TStopwatch.StartNew; for i := 1 to 99999999 do begin a := fasterlog2(sa[i]); end; s.Stop; WriteLn(s.ElapsedMilliseconds); s.Reset; s.Start; for i := 1 to 99999999 do begin d := fasterlog2asm64(sa[i]); end; s.Stop; WriteLn(s.ElapsedMilliseconds); ReadLn; end.` – J... Jun 26 '15 at 17:39
@J..., some compiler versions can optimize away stepping into the function (in your timing loop). It can be avoided by making use of the function result afterwards. – LU RD Jun 26 '15 at 19:41
@J... I've not look at your sse version in detail but isn't it converting to double and doing double ops? All arithmetic should be single and the constants should be single. I really appreciate your edits but the sse version should produce identical output. – David Heffernan Jun 26 '15 at 19:43
@DavidHeffernan Yes, it's doing double ops. It has to since the conversion from 32-bit Uint is lossy if you do the intermediates in single precision. It's no different with x87 where intermediates are done in extended precision. `FILD` will produce more precision in `st0` than if you popped it out to a `single` variable and back in again. Converting to `single` in SSE will actually force that conversion directly in the FPU. The best solution here would be to tweak the constants to actually represent their `single` precision values at compile time. – J... Jun 26 '15 at 19:51
@DavidHeffernan Also, what might not be obvious is that SSE must perform operations on like-types. You cannot multiply an `integer` by a `single` or a `single` by a `double` - types must be converted explicitly. Since x87 does all of this in extended, the next best thing that can be done with SSE is to perform all ops in the highest precision available (ie: `double`). If the consts were declared as single they would need a conversion operation to convert them to `double` - it's not free with the load operation like it is with x87. – J... Jun 26 '15 at 20:27
@LURD I'm pretty sure that's not the case here (ie: that the functions are optimized away), maybe because they are pure ASM? It's pretty easy to modify the function implementation in the above test and see the expected changes in execution time, even compiling with optimizations. – J... Jun 26 '15 at 20:31
@J..., Using XE7 here with optimizations on. In 32 bit mode all inlined function calls are removed. (Tested all solutions in the answers here). Does not happen on the 64 bit compiler. If I add WriteLn(a,' ',s.ElapsedMilliseconds); after the loop, the functions are actually called. – LU RD Jun 26 '15 at 20:37
@LURD The methods in the answer were not inlined - if you don't use the `inline` directive then everything goes as expected. (at least it does for me... XE7, with optimizations). – J... Jun 26 '15 at 20:45
@J..., I know that ;-) I was just comparing all functions here and wanted to make a remark on this subject. It may not be obvious and in another (more optimized) compiler version this could happen on non-inlined calls as well. Happened a few times for me when doing some benchmarking. – LU RD Jun 26 '15 at 20:53

Ondrej Kelle · Answer 2 · 2015-06-26T09:45:35.403

5

How about this:

function fasterlog2(x: Single): Single; inline;
const
  f1: Single = 1.1920928955078125e-7;
  f2: Single = -126.94269504;
var
  i: Cardinal absolute x;
begin
  Result := i * f1 + f2;
end;

edited Jun 26 '15 at 09:45

answered Jun 26 '15 at 07:53

Ondrej Kelle

36,941
2
65
128

As I recall, it was written somewhere that using `absolute` keyword will prevent compiler optimization on it. However, +1 because my question was not about optimization. – Astaroth Jun 26 '15 at 08:12
This gives a different answer from the C code because you use double precision (or perhaps extended precision) constants – David Heffernan Jun 26 '15 at 08:30
@DavidHeffernan Good catch! Thanks. – Ondrej Kelle Jun 26 '15 at 09:43

Rudy Velthuis · Answer 3 · 2015-06-26T14:52:59.847

4

A possible translation is:

function fasterlog2(x: Single): Single;
type
  TVx = record 
    case Byte of
      0: (f: Single);
      1: (i: UInt32); // Or Cardinal, depending on version
  end;
const
  C1: Single = 1.1920928955078125e-7;
  C2: Single = 126.94269504;
var
  y: Single;
  vx: TVx;
begin
  vx.f := x;
  y := vx.i;
  y := y * C1;
  Result := y - C2;
end;

I assume this somehow uses knowledge of the bit patterns of the Single. I am not sure if it really gives you a faster log, but that is what the C routine is doing.

edited Jun 26 '15 at 14:52

answered Jun 26 '15 at 07:37

Rudy Velthuis

28,387
5
46
94

`Invalid typecast` for both Single() expressions. Otherwise its fine. Comparing to log2(100.0) there is a diff in the 3rd digit. – LU RD Jun 26 '15 at 07:42
You are on the right lines by trying to force the constants into 32 bit values. But the syntax is wrong. Use a typed constant. – David Heffernan Jun 26 '15 at 08:29
I'd also put TVx declaration right into the function, to clutter less. – ZzZombo Jun 26 '15 at 08:37
@LURD: Sure, I had forgotten that Delphi forbids any typecast to Single, Double or Extended. I'll remove the typecasts. – Rudy Velthuis Jun 26 '15 at 14:38
@Rudy_Velthuis - +1. Thanks for the literal translation so that I can learn more about Delphi from C perspective. – Astaroth Jun 26 '15 at 19:07

LU RD · Answer 4 · 2015-06-26T08:45:17.313

Try this (literal translation):

function fasterlog2(x : Single): Single; inline;
type
  TX = record
    case boolean of
      false: (f : Single);
      true:  (i : Cardinal);
  end;
const
  f1 : Single = 1.1920928955078125e-7;
  f2 : Single = -126.94269504;
var
  vx: TX absolute x;
  y: Single;
begin
  y := vx.i;
  y := y * f1;
  Result := y + f2;
end;

WriteLn(fasterlog2( 1024.0));
WriteLn(Math.Log2( 1024.0));

Outputs:

 1.00573043823242E+0001
 1.00000000000000E+0001

Or a oneliner (similar to Davids example):

function fasterlog2(x : Single): Single; inline;
const
  f1 : Single = 1.1920928955078125e-7;
  f2 : Single = -126.94269504;
begin
  Result := PCardinal(@x)^ * f1 + f2;
end;

This gives a different answer from the C code because you use double precision (or perhaps extended precision) constants — David Heffernan, Jun 26 '15 at 08:31

Delphi: How do I translate this C code, which performs low-level access to IEEE floating-point number?

More Information

4 Answers4