1

I have a mining program that is written in C++ for a cpu which is an adaptation of a popular monero v7 algorithm. Im trying to make this work on a gpu. The trouble is the monero gpu code is written in a different style so i cant direct copy and paste all the code. There's just this last snippet of code causing me problems. here is the source i want to replicate.

CPU CODE https://gist.github.com/monkins1010/42eae4db87667b16a2aeee3677ee20cd

the section of code causing me confusion line 62 in above link:

    al0 += hi;
    ah0 += lo;

    uint64_t tmp_al0 = al0;
    VARIANT1_2(al0, 0);
    ((uint64_t*)&l0[idx0 & MASK])[0] = al0;
    ((uint64_t*)&l0[idx0 & MASK])[1] = ah0;
    al0 = tmp_al0;

    ah0 ^= ch;
    al0 ^= cl;
    idx0 = al0;

VARIANT1_2(al0,0) performs this:

#define VARIANT1_2(p, part) \
if (VARIANT > 0) { \
    const uint8_t tmp2 = ((uint8_t*)(&p))[1]; \
    static const uint32_t table2 = 0x75310; \
    const uint8_t index2 = (((tmp2 >> 3) & 6) | (tmp2 & 1)) << 1; \
    ((uint8_t*)(&p))[1] = tmp2 ^ ((table2 >> index2) & 0x33); \

The above code is recasting data to other types and performing operations. It is new code that is a copy of Variant1_1 that is used a few lines earlier in the code but operates on a different variable. This code doesn't exist in the GPU code so i am trying to splice it in this:

My GPU CODE COPY ATTEMPT https://gist.github.com/monkins1010/ada8792804e5dd58191d9debc811a6ca

in this code above on line 77 the (VARIANT > 0) section == VARIANT1_1(&l0[idx0 & MASK]); (in the CPU CODE)

    if (VARIANT > 0)
        {
            const uint32_t table = 0x86420U;
            const uint32_t index = ((z >> 26) & 12) | ((z >> 23) & 2);
            const uint32_t fork_7 = z ^ ((table >> index) & 0x30U) << 24;
            storeGlobal32(long_state + j, sub == 2 ? fork_7 : z);
        }
        else
            storeGlobal32(long_state + j, z);

the above code here from my GPU attempt is ok, it is the first variant1_1 in the CPU code.

The second VARIANT1_2 in the (CPU CODE line 63) and some surrounding code is new code and is totally different to the variant in the (GPU code around line 101).

I'm trying to figure out what al0 and ah0 would be equal to in the GPU CODE, so i can perform the VARIANT1_2 in the new GPU code im adapting.

my code attempt so far is:

*((uint64_t *)t2) += sub2 ? (*((uint64_t *)t1) * *((uint64_t*)zz)) : __umul64hi(*((uint64_t *)t1), *((uint64_t*)zz));

        res = *((uint64_t *)t2) >> (sub & 1 ? 32 : 0);

        ///*****************VARAINT 2nd attempt******************

        if (VARIANT > 0)    /// first attempt

        {
            uint32_t vl = res;

            const uint8_t tmp2 = ((uint8_t*)(&vl))[1]; \
            const uint32_t table2 = 0x75310U; \
            const uint8_t index2 = (((tmp2 >> 3) & 6) | (tmp2 & 1)) << 1; \
            ((uint8_t*)(&vl))[1] = tmp2 ^ ((table2 >> index2) & 0x33U); \

            storeGlobal32(long_state + j, sub == 3 ? vl : res);
        }
        else
            storeGlobal32(long_state + j, res); //*addr = val;
        ///*****************VARAINT 2nd attempt - end******************

I'm sorry this is quite a long question but hopefully there's some kind people out there who can help. Many thanks, Chris

  • this may also provide some clues as it shows how Monroe was converted: https://github.com/webchain-network/webchain-miner/commit/2bafbc98cf01c331b15a10530fc661650d85f877#diff-1e7f29d6c8353cdec37437023b98b693 section 33 and 20 are the parts which im referencing – Monkins1010 Jul 21 '18 at 09:31

0 Answers0