Use of half2 in CUDA

Question

I am trying to use half2, but I run into an error, namely,

error: class "__half2" has no member "y"

The section of code where the error occurs is as follows:

uint8_t V_ [128];       // some elements (uint8), to save space
float   V_C[128];       // storing the diff to use later
half2 *C_ = C.elements; // D halfs stored as half2, to be read
Cvalue = 0.0;
for (d = 0; d < D; d+=2)
{
  V_C [d  ] = V_[d]   - __half2float(C_[d/2].x)    ;
  V_C [d+1] = V_[d+1] - __half2float(C_[d/2].y)    ;
  Cvalue   += V_C [d]   * V_C [d]  ;
  Cvalue   += V_C [d+1] * V_C [d+1];
}

Any help please?

Update: Thank you for your help! I finally used the following...

uint8_t V_ [128] ;
float   V_C[128] ;
const half2 *C_ = C.elements;
Cvalue = 0.0;
float2 temp_;
for (d = 0; d < D; d+=2)
  {
    temp_     = __half22float2(C_[d/2]);
    V_C [d  ] = V_[d]   - temp_.x      ;
    V_C [d+1] = V_[d+1] - temp_.y      ;
    Cvalue   += V_C [d]   * V_C [d]  ;
    Cvalue   += V_C [d+1] * V_C [d+1];
  }

I got a slight speedup in my particular application, as loads from global memory was the bottleneck...

score 7 · Accepted Answer · answered May 10 '16 at 09:09

You cannot access parts of a half2 with dot operator, you should use intrinsic functions for that.

From the documentation:

__CUDA_FP16_DECL__ float __high2float ( const __half2 a )
    Converts high 16 bits of half2 to float and returns the result. 
__CUDA_FP16_DECL__ __half __high2half ( const __half2 a )
    Returns high 16 bits of half2 input. 
__CUDA_FP16_DECL__ __half2 __high2half2 ( const __half2 a )
    Extracts high 16 bits from half2 input. 
__CUDA_FP16_DECL__ __half2 __highs2half2 ( const __half2 a, const __half2 b )
    Extracts high 16 bits from each of the two half2 inputs and combines into one half2 number. 
__CUDA_FP16_DECL__ float __low2float ( const __half2 a )
    Converts low 16 bits of half2 to float and returns the result. 
__CUDA_FP16_DECL__ __half __low2half ( const __half2 a )
    Returns low 16 bits of half2 input. 
__CUDA_FP16_DECL__ __half2 __low2half2 ( const __half2 a )
    Extracts low 16 bits from half2 input. 
__CUDA_FP16_DECL__ __half2 __lowhigh2highlow ( const __half2 a )
    Swaps both halves of the half2 input. 
__CUDA_FP16_DECL__ __half2 __lows2half2 ( const __half2 a, const __half2 b )
    Extracts low 16 bits from each of the two half2 inputs and combines into one half2 number.

More than that, depending on what type C.elements is, this line

half2 *C_ = C.elements; // D halfs stored as half2, to be read

might be wrong (if C.elements is a half*. Comment is unclear here). half2 is not a pair of halfs. Indeed, in current implementation half2 is just an unsigned int wrapped in a struct:

// cuda_fp16.h

typedef struct __align__(2) {
   unsigned short x;
} __half;

typedef struct __align__(4) {
   unsigned int x;
} __half2;

#ifndef CUDA_NO_HALF
typedef __half half;
typedef __half2 half2;
#endif /*CUDA_NO_HALF*/

No one said that an array of halfs can be accessed as an array of half2s.

Alignment considerations appart, the documentation reads that you can extract "low 16 bits" or "high 16 bits" to get the corresponding half of the half2. Hence, even though not explicitly stated, documentation is fairly clear on the fact that half2 is a pair of half with 32bits alignment. I will even risk the parallel with __m128d being a pair of FP64 (with alignment sometimes made optional with aliasing). — Florent DUGUET, May 10 '16 at 13:33
@FlorentDUGUET We may guess, experiment and even play around with bits, yes. I like it as anyone else. But in no event you should rely on these assumptions in the production code. Feel free to post the bit hacking results, I would be very excited to see whether your hypothesis confirms. — Ivan Aksamentov - Drop, May 10 '16 at 17:38
In my humble opinion, __low2half documentation is rather explicit on the bit layout of half2. There is no explicit contract on it indeed, but documentation looks explicit enough to me. Your point remains as this more looks like an opinion indeed. — Florent DUGUET, May 10 '16 at 18:26
They actually cast from ```half *``` to ```half2 *``` themselves in https://developer.nvidia.com/blog/mixed-precision-programming-cuda-8/ — paleonix, Nov 16 '20 at 11:53

Use of half2 in CUDA

1 Answers1