0

I am trying to tokenize a line and put it into a two dimensional array so far I have come up with this but I feel I am far off:

/**
 * Function to tokenize an input line into seperate tokens
 *
 * The first arg is the line to be tokenized and the second arg points to
 * a 2-dimentional string array. The number of rows of this array should be
 * at least MAX_TOKENS_PER_LINE size, and the number of columns (i.e., length
 * of each string should be at least MAX_TOKEN_SIZE)
 *
 * Returns 0 on success and negative number on failure
 */

int __tokenize(char *line, char tokens[][MAX_TOKEN_SIZE], int *num_tokens){

char *tokenPtr;
tokenPtr = strtok(line, " \t");
    for(int j =0; j<MAX_TOKEN_SIZE; j++){
      while(tokenPtr != NULL){
        if(!(tokens[][j] = tokenPtr)){return -1;}
            num_tokens++;
            tokenPtr = strtok(NULL, " \t");
        }
    }
  return 0;
}
  • I think you might want to edit that post as the question doesn't appear to be complete. – joce May 04 '11 at 17:27
  • 1
    `strtok` takes 2 arguments. What system/language (with a 3-argument `strtok`) are you using? – pmg May 04 '11 at 17:28
  • 1
    In C, strtok is usually used in 2 steps: first initialization (`strtok(INPUT_STRING, DELIMITERS)`) and then, in a loop, grabbing more chunks (`strtok(NULL, DELIMITERS)`). – pmg May 04 '11 at 17:32
  • [`strtok`](http://perkamon.alioth.debian.org/online/man3/strtok.3.php) and [`strsep`](http://perkamon.alioth.debian.org/online/man3/strsep.3.php) - the docs are your friends. – nmichaels May 04 '11 at 17:34
  • how do you go about listing multiple delimiters? – Greg Trujillo May 04 '11 at 17:34
  • if you wanted spaces and tabs as delimiters, the line would look something like `tokenPtr = strtok(NULL, " \t");`. You just pass it the array of chars you want as delimiters. – John Leehey May 04 '11 at 17:35

3 Answers3

1
int __tokenize(char *line, char tokens[][MAX_TOKEN_SIZE], int *num_tokens)
{
char *tokenPtr;
tokenPtr = strtok(line, " \t");
for (int i = 0; tokenPtr; i++)
{
            tokens[i] = tokenPtr;
            tokenPtr = strtok(NULL, " \t");
}
}

Hope this should work.

maheshgupta024
  • 7,657
  • 3
  • 20
  • 18
  • where does the return go and if I am increasing num_tokens every time I add a token into the array does that go into the for loop as well? Also can you explain the tokenPtr in the middle of the for loop as the condition? Thank you for your help too so far – Greg Trujillo May 04 '11 at 18:46
  • Using `tokenPtr` as a condition causes the loop to exit when `tokenPtr` is `NULL` (that is the equivalent of 0 or false in `C`). – BMitch May 04 '11 at 22:01
0
  1. tokenPtr is not initialized - it may or may not be NULL the first time through the loop.
  2. strtok takes 2 arguments. If you want to split on multiple chars, include them all in the 2nd string.
  3. After the strtok call, token pointer points to the string you want. Now what? You need somewhere to store it. Perhaps an array of char*? Or an 2d array of characters, as in your edited prototype.
  4. tokens[i] is storage for MAX_TOKEN_SIZE characters. strtok() returns a pointer to a string (a sequence of 1 or more characters ). You need to copy one into the other.
  5. What is the inner loop accomplishing?

Note that char tokens[][MAX] is usually referred to as a 2-D array of characters. (or a 1-D array of fixed-length strings). A 2-D array of strings would be char* tokens[][MAX]

AShelly
  • 34,686
  • 15
  • 91
  • 152
  • 1
    Sorry to be pedantic, but if you're storing a single string, then it would be an array of chars (not char*), or perhaps just a malloc'd char*. Then again, the tag does say multidimensional, so an array of char* would probably be appropriate. – John Leehey May 04 '11 at 17:33
  • @John, you are correct for a single string. I was thinking about where to store the complete set of tokens. – AShelly May 04 '11 at 18:06
0

You should implement a finite state machine, I've just finish my shell command Lexer/Parser (LL) Look : How to write a (shell) lexer by hand

Community
  • 1
  • 1
mathieug
  • 901
  • 1
  • 11
  • 24