1
void write_solution(uchar our_index[16], global uchar *solution) {
    uchar8 solution_data = 0;
    solution_data.s0 = (our_index[0] & 0xF) + ((our_index[1] & 0xF) << 4);
    solution_data.s1 = (our_index[2] & 0xF) + ((our_index[3] & 0xF) << 4);
    solution_data.s2 = (our_index[4] & 0xF) + ((our_index[5] & 0xF) << 4);
    solution_data.s3 = (our_index[6] & 0xF) + ((our_index[7] & 0xF) << 4);
    solution_data.s4 = (our_index[8] & 0xF) + ((our_index[9] & 0xF) << 4);
    solution_data.s5 = (our_index[10] & 0xF) + ((our_index[11] & 0xF) << 4);
    solution_data.s6 = (our_index[12] & 0xF) + ((our_index[13] & 0xF) << 4);
    solution_data.s7 = (our_index[14] & 0xF) + ((our_index[15] & 0xF) << 4);
    vstore8(solution_data, 0, solution);
}

As can be seen in the code, it would be really lovely if I could just write it like this instead:

void write_solution(uchar our_index[16], global uchar *solution) {
    uchar8 solution_data = 0;
    for(int i = 0; i < 8; i++) {
        solution_data[i] = (our_index[i * 2] & 0xF) + ((our_index[i * 2 + 1] & 0xF) << 4);
    }
    vstore8(solution_data, 0, solution);
}

But of course, OpenCL doesn't allow the indexed notation described in the above code to be used with vector types.

Is there anything I can do to solve this issue?

Xirema
  • 19,889
  • 4
  • 32
  • 68
  • Is there any way that `our_index` can be passed as a `uchar16` instead of an array? If so, this can be written very concisely using the `.even` and `.odd` suffixes. – jprice Jul 13 '16 at 17:30
  • @jprice I'm looking into it. I think it's possible, but it's not a trivial change. – Xirema Jul 13 '16 at 17:35
  • You could also just use a `uchar[8]` array for `solution_data` instead of a vector, and then use a loop to write it to `solution` and hope that the compiler will unroll the loop and perform the same optimisations that it might with a `vstore8`. – jprice Jul 13 '16 at 17:38

1 Answers1

4

Vector operations are component-wise, and you can take advantage of the .even and .odd vector addressing modes. Does this work for you?

void write_solution(uchar16 our_index, global uchar *solution) {
    uchar8 solution_data = 0;
    solution_data = (our_index.even & 0xF) + ((our_index.odd & 0xF) << 4);
    vstore8(solution_data, 0, solution);
}
Xirema
  • 19,889
  • 4
  • 32
  • 68
mfa
  • 5,017
  • 2
  • 23
  • 28
  • Doesn't `our_index` need to be converted to a `uchar16` for this to work? – Xirema Jul 13 '16 at 17:36
  • ah good point. I missed that. It's probably worth trying the conversion or passing in a uchar16. The vector operations take advantage of SIMD units that most modern hardware has. (in fact, I believe SIMD is a requirement for a device to be opencl compliant) – mfa Jul 13 '16 at 17:51