I'm trying to work with an array of strings(words) in CUDA.
I tried flattening it by creating a single string, but then then to index it, I'd have to go through some of it each time a kernel runs. If there are 9000 words with a length of 6 characters, I'd have to examine 53994 characters in the worst case for each kernel call. So I'm looking for different ways to do it.
Update: Forgot to mention, the strings are of different lengths, so I'd have to find the end of each one.
The next thing I tried was copying each word to different memory locations, and then collect the addresses, and pass that to the GPU as an array with the following code:
# np = numpy
wordList = ['asd','bsd','csd']
d_words = []
for word in wordList:
d_words.append(gpuarray.to_gpu(np.array(word, dtype=str)))
d_wordList = gpuarray.to_gpu(np.array([word.ptr for word in d_words], dtype=np.int32))
ker_test(d_wordList, block=(1,1,1), grid=(1,1,1))
and in the kernel:
__global__ void test(char** d_wordList) {
printf("First character of the first word is: %c \n", d_wordList[0][0]);
}
The kernel should get an int32 array of pointers that point to the beginning of each word, effectively being a char** (or int**), but it doesn't work as I expect.
What is wrong with this approach?
Also what are the "standard" ways to work with strings in PyCUDA (or even in CUDA) in general?
Thanks in advance.