0

Below is a section of a Tokenizer I built. The user types a string they wish to tokenize, that string is stored into a char array, and a null character ('\0') is placed as soon as the string ends. That section of the code seems to work fine after having tested it a few times.

The problem I'm getting occurs later on in the code when I make an array (tokenArray) of arrays (newToken). I use functions to get number of tokens and token length.

I entered the string "testing pencil calculator." I then store each token into an array. The problem is when I go to print the contents of the array, the loop that I have printing stops before it should.

Here's a sample input/output. My comments (not in code) noted by

$testing pencil calculator //string entered

complete index: 0        //index of the entire array, not the tokenized array
token length: 7          //length of 1st token "testing" 
pointer: 0xbf953860 
tokenIndex: 0            //index of the token array (array of arrays)
while loop iterations: 4 //number of times the while loop where i print is iterated. should be 7
test                     //the results of printing the first token 

complete index: 8                                                
token length: 6          //next token is "pencil"                                                                              
tokenIndex: 1                                                    
while loop iterations: 5 //should be 6                                       
penci                    //stops printing at penci     

complete index: 15                                                  
token length: 10         //final token is "calculator"                                           
pointer: 0xbf953862                                                   
tokenIndex: 2                                                         
while loop iterations: 5 //should be 10                                              
calcu                    //stops printing at calcu

for the life of me, I simply cannot figure out why the while loop is exiting before it is supposed to. I doubt this is the only problem with my methodology, but until I can figure this out, I can't address other bugs.

Below is a section of my code that is responsible for this:

from main:

  completeString[inputsize] = '\0';   

  char tokenArray[numTokens+1];
  tokenArray[numTokens] = '\0';    
  putTokensInArray(tokenArray, completeString);  

method where I'm getting errors:

char ** putTokensInArray(char tokenArray[], char * completeString){
  int completeIndex = 0;
  int tokenIndex = 0;

  while(tokenArray[tokenIndex] != '\0'){
    int tokenLength = tokenSize(completeString, completeIndex);
    char newToken [tokenLength+1];
    newToken[tokenLength] = '\0';
    tokenArray[tokenIndex] = *newToken;

    printf("\ncomplete index: %d", completeIndex);
    printf("\ntoken length: %d", tokenLength);
    printf("\ntokenIndex: %d\n", tokenIndex);

    int i = 0;
    while(newToken[i] != '\0'){
      newToken[i] = completeString[i + completeIndex];
      i++;
    }
    completeIndex += (tokenLength+1);

    printf("while loop iterations: %d\n", i);

    for(int j = 0; newToken[j] != '\0'; j++){
      printf("%c", newToken[j]);
    }

    tokenIndex++;
    tokenLength = 0;

  }//big while loop 
}//putTokensInArray Method

I have tried several things but just cannot get the grasp of it. I'm new to C, so it's entirely possible I'm making pointer mistakes or accessing memory I shouldn't be; on that note, how would I implement a malloc() and free()? I've been doing reading on that and seems to work, but I'm unable to implement those functions.

Omar Khalik
  • 13
  • 1
  • 1
  • 9
  • `*newToken` causes undefined behaviour (you read the first character out of `newToken` but you never initialized the contents) – M.M Feb 23 '17 at 04:11
  • Also your function is declared to return `char **` but you dont have a `return` statement – M.M Feb 23 '17 at 04:12
  • I haven't implemented the return statement yet. So what you're saying is that I need to initalize the contents of an array when I make it? So I would just loop the array and put in a random value in all indexes? – Omar Khalik Feb 23 '17 at 04:18
  • You can't read out of an array before you put something in it. Your code has a lot of problems though. I suspect that you intended for that line to store a pointer to the array, not read the first character of it. But the rest of your code is not set up to store that. – M.M Feb 23 '17 at 04:39

1 Answers1

0

You are passing an uninitialized character array to your function putTokensInArray. Later in that function, in the while loop condition you are checking for \0 for every element starting from 0. However since the array is uninitialized, those characters could be ANY characters. There could be a \0 before the numTokens+1th element.

To fix this problem, pass the length of the character array i.e. numTokens to the putTokensInArray as an additional argument. Then in your while loop, do the following condition check instead:

while(tokenIndex < numTokens){

VHS
  • 9,534
  • 3
  • 19
  • 43
  • Thank you. This may sound dumb, but is there a correct or a standard way to initialize contents of an array in C? Do we place -1 into every index or something like that? Or is this where malloc and free come into play? I'm a beginner so I'm still trying to understand all these things. I appreciate your help! – Omar Khalik Feb 23 '17 at 04:31
  • @OmarKhalik, initializing a char array is real easy. `char tokenArray[numTokens+1] = "";` This will initialize every element to 0. – VHS Feb 23 '17 at 04:59