-2

I want to raise x to the power of y in ptx.

Nvidia has a function ex2 which calculates 2^x and lg2 which calculates log2x but there's no function for calculating x^y.

Is there more clever and simpler solution that multiplying value in loop? How code from .cu file is converted to .ptx when it has pow(x, y)?

Maybe there's clever solution with using ex2 and lg2 to calculate x^y?

Solution:

As @talonmies mentioned:

if z = x^y, then log2(z) = y * log2(x) so x^y = 2^(y*log2(x))

karlosos
  • 1,034
  • 9
  • 25

1 Answers1

2

Here's how nvcc does it.

__global__
void exp(float x, float y, float* z) {
    *z = powf(x,y);
}

nvcc --ptx --use_fast_math exp.cu

exp.ptx

.visible .entry _Z3expffPf(
    .param .f32 _Z3expffPf_param_0,
    .param .f32 _Z3expffPf_param_1,
    .param .u64 _Z3expffPf_param_2
)
{
    .reg .f32   %f<6>;
    .reg .b64   %rd<3>;


    ld.param.f32    %f1, [_Z3expffPf_param_0];
    ld.param.f32    %f2, [_Z3expffPf_param_1];
    ld.param.u64    %rd1, [_Z3expffPf_param_2];
    cvta.to.global.u64  %rd2, %rd1;
    lg2.approx.ftz.f32  %f3, %f1;
    mul.ftz.f32     %f4, %f3, %f2;
    ex2.approx.ftz.f32  %f5, %f4;
    st.global.f32   [%rd2], %f5;
    ret;
}

It's worth comparing this ptx to what you get without --use_fast_math.

andars
  • 1,384
  • 7
  • 12