My program have lots of 4-byte-string, like "aaaa" "bbbb" "cccc"... I need to collect particular strings that passes a crc checking.
Because there's very little chance that a string can pass the crc checking, so I don't want to use a very BIG buffer to hold all results. I prefer the result concated one by one, just like the input. For example, if the input is "aaaabbbbcccc" and "bbbb" doesn't pass the crc checking, the output string should be "aaaacccc" and output_count should be 2.
The code looks like:
__device__
bool is_crc_correct(char* str, int len) {
return true; // for simplicity, just return 'true';
}
// arguments:
// input: a sequence of 4-bytes-string, eg: aaaabbbbccccdddd....
__global__
void func(char* input, int* output, int* output_count) {
unsigned int index = blockDim.x*blockIdx.x + threadIdx.x;
if(is_crc_correct(input + 4*index)) {
// copy the string
memcpy(output + (*output_count)*4,
input + 4*index,
4);
// increase the counter
(*output_count)++;
}
}
Obviously the memory copy is not thread safe, I know atomicAdd function can be used for the ++ operation, but how to make both output and output_count thread safe?