1

I'm trying to tokenize a phone number and split it into two arrays. It starts out in a string in the form of "(515) 555-5555". I'm looking to tokenize the area code, the first 3 digits, and the last 4 digits. The area code I would store in one array, and the other 7 digits in another one. Both arrays are to hold just the numbers themselves.

My code seems to work... sort of. The issue is when I print the two storage arrays, I find some quirks;

  1. My array aCode; it stores the first 3 digits as I ask it to, but then it also prints some garbage values notched at the end. I walked through it in the debugger, and the array only stores what I'm asking it to store- the 515. So how come it's printing those garbage values? What gives?

  2. My array aNum; I can append the tokens I need to the end of it, the only problem is I end up with an extra space at the front (which makes sense; I'm adding on to an empty array, ie adding on to empty space). I modify the code to only hold 7 variables just to mess around, I step into the debugger, and it tells me that the array holds and empty space and 6 of the digits I need- there's no room for the last one. Yet when I print it, the space AND all 7 digits are printed. How does that happen?

And how could I set up my strtok function so that it first copies the 3 digits before the "-", then appends to that the last 4 I need? All examples of tokenization I've seen utilize a while loop, which would mean I'd have to choose either strcat or strcpy to complete my task. I can set up an "if" statement to check for the size of the current token each time, but that seems too crude to me and I feel like there's a simpler method to this. Thanks all!

int main() {

    char phoneNum[]= "(515) 555-5555";
    char aCode[3];
    char aNum[7];

    char *numPtr;

    numPtr = strtok(phoneNum, " ");

    strncpy(aCode, &numPtr[1], 3);
    printf("%s\n", aCode);  

    numPtr = strtok(&phoneNum[6], "-");

    while (numPtr != NULL) {
        strcat(aNum, numPtr);
        numPtr = strtok(NULL, "-");
    }
    printf("%s", aNum);  
}
Yu Hao
  • 119,891
  • 44
  • 235
  • 294
David Tamrazov
  • 567
  • 1
  • 5
  • 16

4 Answers4

4

I can primarily see two errors,

  1. Being an array of 3 chars, aCode is not null-terminated here. Using it as an argument to %s format specifier in printf() invokes undefined behaviour. Same thing in a differrent way for aNum, too.

  2. strcat() expects a null-terminated array for both the arguments. aNum is not null-terminated, when used for the first time, will result in UB, too. Always initialize your local variables.

Also, see other answers for a complete bug-free code.

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
  • I'd originally allocated 4 variables to aCode because like you said, it needs to be null terminated, yet I had the same issue as I do now, so I'd figured allocating that extra variable is what forced printf to display the garbage values. – David Tamrazov Jun 24 '15 at 15:16
  • @DavidTamrazov and, 8 for `aNum`? NULL check? – Sourav Ghosh Jun 24 '15 at 15:21
  • Mhm, and 8 for aNum. Which is why I find it so peculiar that its printing the full number and the extra space when it only has room for 7 characters. – David Tamrazov Jun 24 '15 at 15:24
  • When undefined behavior occurs, the behavior is any behavior, it's certainly deterministic, but it cannot be determined by the code in your program, instead there are other factors which determine the behavior, some of which completely external to your program's code. – Iharob Al Asimi Jun 24 '15 at 15:27
  • Gotcha. So all of my issues could be stemming from undefined behaviour caused by the fact that my arrays have improper sizes, and my lack of a NULL check? – David Tamrazov Jun 24 '15 at 15:31
  • @DavidTamrazov That's it. :-) – Sourav Ghosh Jun 24 '15 at 15:32
  • Brilliant, I'll get on that then :) Thanks guys! – David Tamrazov Jun 24 '15 at 15:34
  • @DavidTamrazov You're welcome . :-) BTW, you can consider [accepting](http://meta.stackexchange.com/q/5234) an answer that helped you. – Sourav Ghosh Jun 24 '15 at 16:05
4

The biggest problem in your code is undefined behavior: since you are reading a three-character constant into a three-character array, you have left no space for null terminator.

Since you are tokenizing a value in a very specific format of fixed length, you could get away with a very concise implementation that employs sscanf:

char *phoneNum = "(515) 555-5555";
char aCode[3+1];
char aNum[7+1];
sscanf(phoneNum, "(%3[0-9]) %3[0-9]-%4[0-9]", aCode, aNum, &aNum[3]);
printf("%s %s", aCode, aNum);

This solution passes the format (###) ###-#### directly to sscanf, and tells the function where each value needs to be placed. The only "trick" used above is passing &aNum[3] for the last argument, instructing sscanf to place data for the third segment into the same storage as the second segment, but starting at position 3.

Demo.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • that's a really interesting solution- I've never seen sscanf used this way before (beginner at C)! But unfortunately the exercise explicitly asks me to work with tokens. – David Tamrazov Jun 24 '15 at 15:27
2

Your code has multiple issues

  1. You allocate the wrong size for aCode, you should add 1 for the nul terminator byte and initialize the whole array to '\0' to ensure end of lines.

    char aCode[4] = {'\0'};
    
  2. You don't check if strtok() returns NULL.

    numPtr = strtok(phoneNum, " ");
    strncpy(aCode, &numPtr[1], 3);
    
  3. Point 1, applies to aNum in strcat(aNum, numPtr) which will also fail because aNum is not yet initialized at the first call.

  4. Subsequent calls to strtok() must have NULL as the first parameter, hence

    numPtr = strtok(&phoneNum[6], "-");
    

    is wrong, it should be

    numPtr = strtok(NULL, "-");
    
Samuel
  • 135
  • 1
  • 13
Iharob Al Asimi
  • 52,653
  • 6
  • 59
  • 97
  • @iharob I don't check for NULL in my first strtok because I only need the one token- the (515). When I create several tokens of the last 7 digits, then I check for it. As for the aNum and aCode, I'd allocated the extra room for the null terminator previously but it was still printing garbage values. I'd figured it was caused by my explicit allocation of extra space, so I adjusted it to what you see now. But the same behaviour is shown. – David Tamrazov Jun 24 '15 at 15:18
  • @DavidTamrazov If you don't check for `NULL` the next line is likely to invoke undefined behavior, also, **YOU MUST ALWAYS CHECK FOR ANY POSSIBLE ERROR** or you will write buggy code all the time. [This is something I found yesterday](http://spinroot.com/p10/), the 7th rule applies to this situation. – Iharob Al Asimi Jun 24 '15 at 15:20
  • 1
    @iharob is really number 4 an issue? Programatically it looks fine, as he is working with fixed pattern data. – Sourav Ghosh Jun 24 '15 at 15:21
  • 1
    @SouravGhosh I am not **really** sure, but I think that using the same pointer will cause undefined behavior, remember that `strtok()` is not reentrant because it uses an internal pointer to store current token position, using `strtok()` is almost always a bad idea. – Iharob Al Asimi Jun 24 '15 at 15:23
  • @SouravGhosh #4 is not an issue for now iff the phone number has been pre-validated somehow to conform to the exact format specified (then it simply starts a new `strtok` run instead of continuing the old one), but if the specification changes or the number is not in the exact format given (e.g., area code has only two digits) then it fails. Never-the-less, it is an ugly hard-coded magic number (6). – Arkku Jun 24 '15 at 15:24
0

Other answers have already mentioned the major issue, which is insufficient space in aCode and aNum for the terminating NUL character. The sscanf answer is also the cleanest for solving the problem, but given the restriction of using strtok, here's one possible solution to consider:

char phone_number[]= "(515) 555-1234";
char area[3+1] = "";
char digits[7+1] = "";
const char *separators = " (-)";
char *p = strtok(phone_number, separators);
if (p) {
    int len = 0;
    (void) snprintf(area, sizeof(area), "%s", p);
    while (len < sizeof(digits) && (p = strtok(NULL, separators))) {
        len += snprintf(digits + len, sizeof(digits) - len, "%s", p);
    }
}
(void) printf("(%s) %s\n", area, digits);
Arkku
  • 41,011
  • 10
  • 62
  • 84