String tokening in C

Question

I have strings like "− · · · −" (Morse code) in an array, and want to tokenize each string to get each individual dot(.) and dash(−). A part of my code is given below:

char *code, *token;
    char x;
    char ch[4096];
    code = &ch[0];

   ..

while((x = tolower(fgetc(fp))) != EOF){
            printf("%c \n", x);
            switch(x){
                case 'a':
                    strcpy(code, "· −");
                    break;
                case 'b':
                    strcpy(code, "− · · ·");
                    break;
                case 'c':
                    strcpy(code, "− · − · ");
                    break;
                case 'd':
                    strcpy(code, "− · ·");
                    break;
                case 'e':
                    strcpy(code, "· ");
                    break;
                case 'f':
                    strcpy(code, "· · − ·" );
                    break;
                case 'g':
                    strcpy(code, "− − · ");
                    break;
                case 'h':
            }
            if(x!= 10){
                printf("Value read : %s \n", code);
                token = strtok(code, " ");
                while(token != NULL){
                    printf("CHARACTER: %s\n", token);
                    token = strtok(NULL, " ");
                }
            }

So, when the code array has "− − ·", I want the output to have:

CHARACTER: −
CHARACTER: −
CHARACTER: ·

However, the output is instead having CHARACTER: − − · I am new to string tokenizing, and might made a mistake somewhere there. Perhaps my delimiter is wrong, I am not sure. I hope I have provided enough information. Any help on this would be greatly appreciated.

Thanks in advance

Note: Use `int x;` to distinguish `EOF` from all other `char`. — chux - Reinstate Monica, Sep 29 '14 at 15:47
BTW: Are you declaring the breaks between letters insignificant? — Deduplicator, Sep 29 '14 at 15:50
you are all right actually. I had copy-pasted the morse codes (dot-dash sequences) from an online source actually. There, the space appears to be different from what is given by the scroll bar.That's why the delimiter was not matching. Thank you all!! — user3033194, Sep 29 '14 at 16:20

NPE · Accepted Answer · 2014-09-29T16:01:22.103

2

The issue is that the (Unicode) whitespace character in the string literals (e.g. "· · − ·") is different to the whitespace character in the strtok() calls.

Run your source code through xxd and see for yourself.

As far as I can see, the spaces in the strcpy() calls are U+200A whereas the spaces in the strtok() calls are U+0020.

edited Sep 29 '14 at 16:01

answered Sep 29 '14 at 15:49

NPE

486,780
108
951
1,012

thats weird since this code runs fine http://www.cplusplus.com/reference/cstring/strtok/ what would you suggest as an alternative? – Logan Murphy Sep 29 '14 at 15:58
1

@LoganMurphy: I copied and pasted the code from the question into an editor, and it doesn't work until I replace the U+200A spaces with the normal U+0020 spaces. – NPE Sep 29 '14 at 16:00
is it compiler dependent tho? maybe the op can set up a compiler option – Logan Murphy Sep 29 '14 at 16:01
@NPE: throw away the editor. It should never turn a U+0020 into a U+200A. Word processors might do that, but not a proper code editor. – Rudy Velthuis Sep 29 '14 at 16:06

Danny Staple · Answer 2 · 2014-09-29T16:17:19.080

Strtok is not needed thing here (and you don't need those spaces either). If you want the individual characters from the string you could use a simple loop with a pointer over the original string:

char *current=&code;

Then make sure you loop until the end of string (null) character:

while (*current != 0x0) {
  if(*current != ' ') {
      printf("CHARACTER: %c \n", *current);
      current ++;
  }
}

What this does: loops over the characters in code, using current as a pointer, and checking for the null terminator. It then uses an if to check for a space, and if the character is not a space, format prints it - derefing the pointer to the char there. Finally it increments the pointer.

Big warning: If you string is not zero terminated (a standard C string will be), this will start printing silly stuff.

Need to skip the spaces somehow. – Logan Murphy Sep 29 '14 at 15:56 — Logan Murphy, Sep 29 '14 at 15:56
He could add an if perhaps? – Danny Staple Sep 29 '14 at 16:03 — Danny Staple, Sep 29 '14 at 16:03

String tokening in C

2 Answers2