1

The last few hours I am banging my head against the wall and actually do not really understand what's going wrong here.

I have a text file containing word phrases not longer than 128 characters. What I try to do is memory map this file and read of type wchar_t into a large buffer. Basically this file is a textual lookup, given a position and length of string would return a string out of this text index.

Here is - as for demonstration - what I did (or try to accomplish).

int main(int argc, char **argv)
{
    int fd = 0;
    struct stat statbuf;
    wchar_t aux[128] = {0};
    const wchar_t *px = NULL;

    setlocale(LC_CTYPE, "");
    setlocale(LC_COLLATE, "");

    fd = open("./test2_termlist.txt", O_RDONLY); 

    fstat(fd, &statbuf); 

    void *p = mmap(NULL, statbuf.st_size, PROT_READ, MAP_SHARED, fd, 0);

    /* Could have casted p to wchar_t already ... */
    px = (wchar_t *)p;

    /* Copy string with 45 characters from char position 92 */
    memcpy(aux, (const wchar_t *)px + 92, 45);
    aux[45] = L'\0';

    printf("string = %ls\n", aux); 

    return 1;
}

Above is working demo code. I've tried various things such as using wmemcpy or wcsncpy to get the string. The result are always scrambled characters. If I use char instead of wchar_t, things seem to work, but the indices that will be used are based on wide strings and thus not working if the text file is interpreted as char.

I need a fast access to a large text file, that's why i try to use mmap here.

What is my (maybe stupid) mistake here?

NOTE: valgrind does not show any error either.

Andreas W. Wylach
  • 723
  • 2
  • 10
  • 31
  • You should use [wprintf](http://man7.org/linux/man-pages/man3/wprintf.3.html) – LPs Jun 06 '16 at 10:17
  • If you're using `mmap`, you'll need to be aware of the encoding of the file and handle that properly. Probably, the file is in UTF-8, so you need to access it using a `char *`. – MicroVirus Jun 06 '16 at 10:42
  • @MicroVirus: Yeah, you just somehow confirmed what i was thinking. For a quick fix, i now read the mmap witg type `char *` and convert it with `mbstowcs` afterwards. This helps for now, seems I have to change that lookup section to somewhat better solution. – Andreas W. Wylach Jun 07 '16 at 01:17

0 Answers0