0

I'm compiling RHMiner fon an ARM 32 device (armv7-a) using SSE2NEON, specifically an Android (API 21) device. I am new to C/C++ so this may be a simple question to answer, but I cannot find any resources online matching my situation. The app compiles and runs fine on Intel 64-bit processors. The segfault happens on the 32bit platform only - and that was of course removing the static_assert.

static_assert(sizeof(U64) == sizeof(void*), "Incorrect Pointer Size");

This fails and I can't figure out why when the size of U64 is 8 and the size of void* is 4.

Considering this problem, the final code ends with a segmentation fault. On valgrind:

Process terminating with default action of signal 11 (SIGSEGV)
==3055==  Access not within mapped region at address 0xE5F3460
==3055==    at 0x48E46DE: __memcpy_base (in /system/lib/libc.so)
==3055==    by 0x14811B: RandomHash_blake2s(unsigned char*, unsigned char*) (in /data/data/com.termux/files/home/rhminer/rhminer/rhminer)
==3055==  If you believe this happened as a result of a stack
==3055==  overflow in your program's main thread (unlikely but
==3055==  possible), you can try to increase the size of the
==3055==  main thread stack using the --main-stacksize= flag.
==3055==  The main thread stack size used in this run was 8388608.

...

2 errors in context 14 of 24:
==3055== Thread 6:
==3055== Invalid write of size 8
==3055==    at 0x48E46DE: __memcpy_base (in /system/lib/libc.so)
==3055==    by 0x14811B: RandomHash_blake2s(unsigned char*, unsigned char*) (in /data/data/com.termux/files/home/rhminer/rhminer/rhminer)
==3055==  Address 0xe5f3460 is not stack'd, malloc'd or (recently) free'd

How do I go about passing the assert and segmentation fault?

I think the offending code is here, however there's probably a lot more code that will fail considering the circumstances:

   void CUDA_SYM_DECL(RandomHash_blake2s)(RH_StridePtr roundInput, RH_StridePtr output)
{
    uint32_t *in = (uint32_t*)RH_STRIDE_GET_DATA(roundInput);

    RH_ALIGN(64) blake2s_state S;
    RH_ALIGN(64) blake2s_param P[1];
    const int outlen = BLAKE2S_OUTBYTES;
    /* Move interval verification here? */

    P->digest_length = outlen;
    P->key_length    = 0;
    P->fanout        = 1;
    P->depth         = 1;
    store32_SSE2( &P->leaf_length, 0 ); 
    P->node_offset[0] = 0;
    P->node_offset[1] = 0;
    P->node_offset[2] = 0;
    P->node_offset[3] = 0;
    P->node_offset[4] = 0;
    P->node_offset[5] = 0;
    P->node_depth    = 0;
    P->inner_length  = 0;

#if defined(_WIN32_WINNT) || defined(__CUDA_ARCH__)
    RH_memzero_8(P->salt, sizeof( P->salt ))
    RH_memzero_8(P->personal, sizeof( P->personal ) );
#else
    memset(P->salt, 0, sizeof( P->salt ));
    memset(P->personal, 0, sizeof( P->personal ) );
#endif

    RH_memzero_of16(&S, sizeof( blake2s_state ) );    

    for( int i = 0; i < 8; ++i ) S.h[i] = blake2s_IV[i];

    uint32_t *p = ( uint32_t * )( P );

    /* IV XOR ParamBlock */
    for( size_t i = 0; i < 8; ++i )
        S.h[i] ^= load32_SSE2( &p[i] );

    _CM(blake2s_update_SSE2)( &S, ( uint8_t * )in, RH_STRIDE_GET_SIZE(roundInput) );
    _CM(blake2s_final_SSE2)( &S, RH_STRIDE_GET_DATA(output), BLAKE2S_OUTBYTES );
    RH_STRIDE_SET_SIZE(output, BLAKE2S_OUTBYTES)
}
Kris B.
  • 95
  • 1
  • 8
  • It just means whoever wrote that code expects that `void*` be 8 bytes but that's not true for your platform. The code probably relies on the assumption that pointers take up 8 bytes. So you shouldn't expect the code to work, since that assumption doesn't hold for the platform you are trying to compile for. – François Andrieux Feb 12 '19 at 19:24
  • A `static_assert` should stop your code from compiling. Is it actually present in the code? – NathanOliver Feb 12 '19 at 19:25
  • Thanks for your fast reply! The code was originally designed for 64bit. Is this the issue? Any idea what sort of changes would be needed to make this compile and run on a 32bit system? – Kris B. Feb 12 '19 at 19:26
  • @KristopherBaylog *" Any idea what sort of changes would be needed to make"* Likely a lot. If it was easy to make portable, I would assume the original implementer wouldn't have bothered with limiting support to a single bitness. – François Andrieux Feb 12 '19 at 19:27
  • @NathanOliver yes, the static_assert is in the code. The entire codebase is here: https://github.com/polyminer1/rhminer . We've been working to conver this code to work on ARM32 with SSE2NEON. – Kris B. Feb 12 '19 at 19:27
  • @FrançoisAndrieux I'm not sure they would say it *runs fine* on that platform if they are getting a segfault. – NathanOliver Feb 12 '19 at 19:27
  • 1
    @KristopherBaylog How did you even get the code to compile then? That `static_asset` should stop you from building the binary. – NathanOliver Feb 12 '19 at 19:28
  • @FrançoisAndrieux The segmentation fault only happens on the ARM compile. It runs fine on Intel 64 bit. – Kris B. Feb 12 '19 at 19:29
  • @FrançoisAndrieux Yeah, I don't think that is happening here – NathanOliver Feb 12 '19 at 19:29
  • @KristopherBaylog I don't know what you ran to get a segfault, but it wasn't the compiled code with the failing `static_assert`. Edit : If you removed to `static_assert` to get it to compile, then it's no surprise that it doesn't work. Those `static_assert`s are there to make sure it doesn't compile if it won't work on the target platform. – François Andrieux Feb 12 '19 at 19:29
  • @FrançoisAndrieux Thank you for all of your help. The segfault happens on the 32bit platform only - and that was of course removing the static_assert. It is correct that the code was designed to run on 64bit. I guess my question should have been more related to "How do I convert this 64bit code to work on a 32bit platform"? – Kris B. Feb 12 '19 at 19:32
  • @KristopherBaylog That seems like what you are actually trying to find out. But it might be hard to answer, it all depends on what algorithm is being implemented. Edit : It sounds like you're working with a cryptocurrency miner. My impression is that those would try to exploit any possible optimization at any cost (including any portability concern). These kinds of optimizations are probably platform or architecture specific. So it might not even be practical to do this conversion. I would expect to have to rewrite portions of the code for 32 bit support. – François Andrieux Feb 12 '19 at 19:34
  • @FrançoisAndrieux I believe you're absolutely correct, the code is certainly heavily optimized for SSE. We were hoping that SSE2NEON would translate this to make it work properly, but it seems that is not the case. Thanks again for your dialog! We'll take it back to the drawing board. – Kris B. Feb 12 '19 at 19:37
  • If the creator of the program put in a test to ensure an expected pointer size there is no telling how much code is based on the assumption of an 8 byte pointer. I'm afraid to say you likely have a long haul ahead of you. – user4581301 Feb 12 '19 at 19:37
  • @user4581301 Thanks for your comment. Yes, they are assuming an 8 byte pointer in the code. Unfortunately, I am not sure how to handle fixing any of it. My lacking knowledge of C isn't helping at all lol. – Kris B. Feb 12 '19 at 20:45
  • " The segfault happens on the 32bit platform only - and that was of course removing the static_assert. " - you should mention this in the question – M.M Feb 12 '19 at 21:53
  • @M.M - Updated, thank you. – Kris B. Feb 13 '19 at 06:06
  • "How do I convert this 64bit code to work on a 32bit platform"? You reverse-engineer the code, understand what it does, find the places where the assumption is relied upon, and rearchitect the code so it doesn't make that assumption any more. There's no magic wand that does this conversion for you. (Imagine you have the blueprint to a building. You remove a wall. The building collapses. How do you convert the blueprint so the building still stands without that wall? You need to study the blueprint, understand what that wall was for, and find a way to do whatever that wall did.) – Raymond Chen Feb 13 '19 at 06:36

0 Answers0