2
static __inline__ uint64_t mulhilo64(uint64_t a, uint64_t b, uint64_t* hip) {
        __uint128_t product = ((__uint128_t)a)*((__uint128_t)b);
        *hip = product>>64;
        return (uint64_t)product;
}

I am trying to write following above using MULX intrinsics on AVX2 (more specifically BMI2). But they do not give the same results.

static __inline__ uint64_t mulhilo64(uint64_t  a, uint64_t b, uint64_t *c){
     return _mulx_u64(a, b, &c);
}
Paul R
  • 208,748
  • 37
  • 389
  • 560
Yigit Demirag
  • 344
  • 1
  • 12
  • Can you provide a simple test case and the result for each implementation ? – Paul R Jul 01 '15 at 09:51
  • The first function is the part of a very long code. I just replaced it with 2nd function, nothing else. Do you think that they should be the same, too? – Yigit Demirag Jul 01 '15 at 09:59
  • I see one probable mistake in the second function (see answer below) - I'm just trying to throw together a simple test case to see if I can replicate the problem. – Paul R Jul 01 '15 at 10:03

1 Answers1

3

It looks like this function could be wrong:

static __inline__ uint64_t mulhilo64(uint64_t  a, uint64_t b, uint64_t *c){
     return _mulx_u64(a, b, &c);
}

It should probably be:

static __inline__ uint64_t mulhilo64(uint64_t  a, uint64_t b, uint64_t *c){
     return _mulx_u64(a, b, c);
}                        // ^

Note that compiling with warnings enabled (e.g. gcc -Wall ...) helps to catch simple mistakes like this.

Paul R
  • 208,748
  • 37
  • 389
  • 560