1

let's say I have two IEEE 32bit floating point values a and b plus one IEEE 32bit fp interpolation value x (something between 0 and 1)

so I get the interpolated result:

float a = 123.456f;
float b = 654.321f;
float result = a * (1.0f-x) + b * x;

let's say I want to use this single interpolation instruction now to interpolate two sets of values, but at lower precision.

somthing like:

float a1 = 123.456f;
float b1 = 654.321f;
float a2 = -234.567f;
float b2 = 12.34f;
float a = pack(a1, a2);
float b = pack(b1, b2;
float result = a * (1.0f-x) + b * x;
float aprox_resul1;
float aprox_result2;
unpack(result, &aprox_result1, &aprox_result2);

such that aprox_result1 is aproximately the interpolation of a1 and b1 and aprox_resul2 is aproximately the result of the interpolation of a2 and b2.

is this possible? if so, how would float pack(float v1, float v2) and void unpack(float packed, float* v1, float* v2) be implemented to achieve this?

matthias_buehlmann
  • 4,641
  • 6
  • 34
  • 76
  • not sure how that would ever work. if the intermediate/packed value was some other datatype then fine, but sounds impossible to me! would love to be shown wrong, as this would presumably mean all the work on binary16 floats is unnecessary! I'm also unsure where you "interpolation" example comes from, is this because you only care about this working for additional and multiplication? i.e. not division, comparisons, etc. – Sam Mason May 19 '21 at 14:23
  • @SamManson I actualy care only about interpolation (not even extrapolation) - because I'd like to push twice as many values through some hardware that does interpolation on IEEE fp values – matthias_buehlmann May 19 '21 at 14:39
  • 1
    if this was x86 hardware I'd suggest looking into SSE/AVX, but presumably it's somewhat more exotic? including details of the hardware would likely help. note that memory bandwidth also quickly becomes an issue as you approach max out FPUs, 16bit floats often win just because they half the memory pressure of the resulting algorithm – Sam Mason May 19 '21 at 14:49
  • It's actually GPU shader interpolation. I have 8 full precision vec4 values that get interpolated by the gpu and fed into the fragment shader, and I'd like to interpolate 64 values – matthias_buehlmann May 19 '21 at 14:54
  • 1
    You are essentially looking for an "approximated" bijection between R² and R. But these can never be continuous (I forgot the name of the corresponding theorem), and thus also unlikely to be preserved by the interpolation. If this is mostly a memory-throughput issue, you may want to have a look into half-precision floats (sometimes called `float16`) – chtz May 19 '21 at 16:13

0 Answers0