I'm compiling RHMiner fon an ARM 32 device (armv7-a) using SSE2NEON, specifically an Android (API 21) device. I am new to C/C++ so this may be a simple question to answer, but I cannot find any resources online matching my situation. The app compiles and runs fine on Intel 64-bit processors. The segfault happens on the 32bit platform only - and that was of course removing the static_assert.
static_assert(sizeof(U64) == sizeof(void*), "Incorrect Pointer Size");
This fails and I can't figure out why when the size of U64 is 8 and the size of void* is 4.
Considering this problem, the final code ends with a segmentation fault. On valgrind:
Process terminating with default action of signal 11 (SIGSEGV)
==3055== Access not within mapped region at address 0xE5F3460
==3055== at 0x48E46DE: __memcpy_base (in /system/lib/libc.so)
==3055== by 0x14811B: RandomHash_blake2s(unsigned char*, unsigned char*) (in /data/data/com.termux/files/home/rhminer/rhminer/rhminer)
==3055== If you believe this happened as a result of a stack
==3055== overflow in your program's main thread (unlikely but
==3055== possible), you can try to increase the size of the
==3055== main thread stack using the --main-stacksize= flag.
==3055== The main thread stack size used in this run was 8388608.
...
2 errors in context 14 of 24:
==3055== Thread 6:
==3055== Invalid write of size 8
==3055== at 0x48E46DE: __memcpy_base (in /system/lib/libc.so)
==3055== by 0x14811B: RandomHash_blake2s(unsigned char*, unsigned char*) (in /data/data/com.termux/files/home/rhminer/rhminer/rhminer)
==3055== Address 0xe5f3460 is not stack'd, malloc'd or (recently) free'd
How do I go about passing the assert and segmentation fault?
I think the offending code is here, however there's probably a lot more code that will fail considering the circumstances:
void CUDA_SYM_DECL(RandomHash_blake2s)(RH_StridePtr roundInput, RH_StridePtr output)
{
uint32_t *in = (uint32_t*)RH_STRIDE_GET_DATA(roundInput);
RH_ALIGN(64) blake2s_state S;
RH_ALIGN(64) blake2s_param P[1];
const int outlen = BLAKE2S_OUTBYTES;
/* Move interval verification here? */
P->digest_length = outlen;
P->key_length = 0;
P->fanout = 1;
P->depth = 1;
store32_SSE2( &P->leaf_length, 0 );
P->node_offset[0] = 0;
P->node_offset[1] = 0;
P->node_offset[2] = 0;
P->node_offset[3] = 0;
P->node_offset[4] = 0;
P->node_offset[5] = 0;
P->node_depth = 0;
P->inner_length = 0;
#if defined(_WIN32_WINNT) || defined(__CUDA_ARCH__)
RH_memzero_8(P->salt, sizeof( P->salt ))
RH_memzero_8(P->personal, sizeof( P->personal ) );
#else
memset(P->salt, 0, sizeof( P->salt ));
memset(P->personal, 0, sizeof( P->personal ) );
#endif
RH_memzero_of16(&S, sizeof( blake2s_state ) );
for( int i = 0; i < 8; ++i ) S.h[i] = blake2s_IV[i];
uint32_t *p = ( uint32_t * )( P );
/* IV XOR ParamBlock */
for( size_t i = 0; i < 8; ++i )
S.h[i] ^= load32_SSE2( &p[i] );
_CM(blake2s_update_SSE2)( &S, ( uint8_t * )in, RH_STRIDE_GET_SIZE(roundInput) );
_CM(blake2s_final_SSE2)( &S, RH_STRIDE_GET_DATA(output), BLAKE2S_OUTBYTES );
RH_STRIDE_SET_SIZE(output, BLAKE2S_OUTBYTES)
}