1

I have strings like "− · · · −" (Morse code) in an array, and want to tokenize each string to get each individual dot(.) and dash(−). A part of my code is given below:

char *code, *token;
    char x;
    char ch[4096];
    code = &ch[0];

   ..

while((x = tolower(fgetc(fp))) != EOF){
            printf("%c \n", x);
            switch(x){
                case 'a':
                    strcpy(code, "· −");
                    break;
                case 'b':
                    strcpy(code, "− · · ·");
                    break;
                case 'c':
                    strcpy(code, "− · − · ");
                    break;
                case 'd':
                    strcpy(code, "− · ·");
                    break;
                case 'e':
                    strcpy(code, "· ");
                    break;
                case 'f':
                    strcpy(code, "· · − ·" );
                    break;
                case 'g':
                    strcpy(code, "− − · ");
                    break;
                case 'h':
            }
            if(x!= 10){
                printf("Value read : %s \n", code);
                token = strtok(code, " ");
                while(token != NULL){
                    printf("CHARACTER: %s\n", token);
                    token = strtok(NULL, " ");
                }
            }

So, when the code array has "− − ·", I want the output to have:

CHARACTER: −
CHARACTER: −
CHARACTER: ·

However, the output is instead having CHARACTER: − − · I am new to string tokenizing, and might made a mistake somewhere there. Perhaps my delimiter is wrong, I am not sure. I hope I have provided enough information. Any help on this would be greatly appreciated.

Thanks in advance

NPE
  • 486,780
  • 108
  • 951
  • 1,012
user3033194
  • 1,775
  • 7
  • 42
  • 63

2 Answers2

2

The issue is that the (Unicode) whitespace character in the string literals (e.g. "· · − ·") is different to the whitespace character in the strtok() calls.

Run your source code through xxd and see for yourself.

As far as I can see, the spaces in the strcpy() calls are U+200A whereas the spaces in the strtok() calls are U+0020.

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • thats weird since this code runs fine http://www.cplusplus.com/reference/cstring/strtok/ what would you suggest as an alternative? – Logan Murphy Sep 29 '14 at 15:58
  • 1
    @LoganMurphy: I copied and pasted the code from the question into an editor, and it doesn't work until I replace the U+200A spaces with the normal U+0020 spaces. – NPE Sep 29 '14 at 16:00
  • is it compiler dependent tho? maybe the op can set up a compiler option – Logan Murphy Sep 29 '14 at 16:01
  • @NPE: throw away the editor. It should never turn a U+0020 into a U+200A. Word processors might do that, but not a proper code editor. – Rudy Velthuis Sep 29 '14 at 16:06
2

Strtok is not needed thing here (and you don't need those spaces either). If you want the individual characters from the string you could use a simple loop with a pointer over the original string:

char *current=&code;

Then make sure you loop until the end of string (null) character:

while (*current != 0x0) {
  if(*current != ' ') {
      printf("CHARACTER: %c \n", *current);
      current ++;
  }
}

What this does: loops over the characters in code, using current as a pointer, and checking for the null terminator. It then uses an if to check for a space, and if the character is not a space, format prints it - derefing the pointer to the char there. Finally it increments the pointer.

Big warning: If you string is not zero terminated (a standard C string will be), this will start printing silly stuff.

Danny Staple
  • 7,101
  • 4
  • 43
  • 56