The last few hours I am banging my head against the wall and actually do not really understand what's going wrong here.
I have a text file containing word phrases not longer than 128 characters. What I try to do is memory map this file and read of type wchar_t
into a large buffer. Basically this file is a textual lookup, given a position and length of string would return a string out of this text index.
Here is - as for demonstration - what I did (or try to accomplish).
int main(int argc, char **argv)
{
int fd = 0;
struct stat statbuf;
wchar_t aux[128] = {0};
const wchar_t *px = NULL;
setlocale(LC_CTYPE, "");
setlocale(LC_COLLATE, "");
fd = open("./test2_termlist.txt", O_RDONLY);
fstat(fd, &statbuf);
void *p = mmap(NULL, statbuf.st_size, PROT_READ, MAP_SHARED, fd, 0);
/* Could have casted p to wchar_t already ... */
px = (wchar_t *)p;
/* Copy string with 45 characters from char position 92 */
memcpy(aux, (const wchar_t *)px + 92, 45);
aux[45] = L'\0';
printf("string = %ls\n", aux);
return 1;
}
Above is working demo code. I've tried various things such as using wmemcpy
or wcsncpy
to get the string. The result are always scrambled characters.
If I use char
instead of wchar_t
, things seem to work, but the indices that will be used are based on wide strings and thus not working if the text file is interpreted as char
.
I need a fast access to a large text file, that's why i try to use mmap
here.
What is my (maybe stupid) mistake here?
NOTE: valgrind does not show any error either.