7

My application produces strings like the one below. I need to parse values between the separator into individual values.

2342|2sd45|dswer|2342||5523|||3654|Pswt

I am using strtok to do this in a loop. For the fifth token, I am getting 5523. However, I need to account for the empty value between the two separators || as well. 5523 should be the sixth token, as per my requirement.

token = (char *)strtok(strAccInfo, "|");

for (iLoop=1;iLoop<=106;iLoop++) { 
            token = (char *)strtok(NULL, "|");
}

Any suggestions?

David Spector
  • 1,520
  • 15
  • 21
Bash
  • 85
  • 1
  • 6

8 Answers8

8

In that case I often prefer a p2 = strchr(p1, '|') loop with a memcpy(s, p1, p2-p1) inside. It's fast, does not destroy the input buffer (so it can be used with const char *) and is really portable (even on embedded).

It's also reentrant; strtok isn't. (BTW: reentrant has nothing to do with multi-threading. strtok breaks already with nested loops. One can use strtok_r but it's not as portable.)

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Patrick Schlüter
  • 11,394
  • 1
  • 43
  • 48
  • I used your input and updated my code. Thanks! I have the code that I am using below as an answer, if you're interested. – Bash Aug 02 '10 at 21:26
  • Thanks, inspired by your answer I made [this](https://stackoverflow.com/a/55544721/5407848) – Accountant م Apr 06 '19 at 00:11
  • Sorry Patrick but could you explain a bit in detail how your solution works? I'm guessing `s` is the original string, but what are `p1` and `p2`? – rdxdkr Jun 10 '20 at 20:50
3
char *mystrtok(char **m,char *s,char c)
{
  char *p=s?s:*m;
  if( !*p )
    return 0;
  *m=strchr(p,c);
  if( *m )
    *(*m)++=0;
  else
    *m=p+strlen(p);
  return p;
}
  • reentrant
  • threadsafe
  • strictly ANSI conform
  • needs an unused help-pointer from calling context

e.g.

char *p,*t,s[]="2342|2sd45|dswer|2342||5523|||3654|Pswt";
for(t=mystrtok(&p,s,'|');t;t=mystrtok(&p,0,'|'))
  puts(t);

e.g.

char *p,*t,s[]="2,3,4,2|2s,d4,5|dswer|23,42||5523|||3654|Pswt";
for(t=mystrtok(&p,s,'|');t;t=mystrtok(&p,0,'|'))
{
  char *p1,*t1;
  for(t1=mystrtok(&p1,t,',');t1;t1=mystrtok(&p1,0,','))
    puts(t1);
}

your work :) implement char *c as parameter 3

user411313
  • 3,930
  • 19
  • 16
2

On a first call, the function expects a C string as argument for str, whose first character is used as the starting location to scan for tokens. In subsequent calls, the function expects a null pointer and uses the position right after the end of last token as the new starting location for scanning.

To determine the beginning and the end of a token, the function first scans from the starting location for the first character not contained in delimiters (which becomes the beginning of the token). And then scans starting from this beginning of the token for the first character contained in delimiters, which becomes the end of the token.

What this say is that it will skip any '|' characters at the beginning of a token. Making 5523 the 5th token, which you already knew. Just thought I would explain why (I had to look it up myself). This also says that you will not get any empty tokens.

Since your data is setup this way you have a couple of possible solutions:
1) find all occurrences of || and replace with | | (put a space in there)
2) do a strstr 5 times and find the beginning of the 5th element.

Romain Hippeau
  • 24,113
  • 5
  • 60
  • 79
  • Thanks for the information. Hopefully, I will remember this the next time I need to. :-D Your first solution screws up my results a bit, because there are valid components within the string that return a space between pipes. The second solution might become tedious and probably not implementable since the string may be different for different sets of data. – Bash Aug 02 '10 at 21:19
  • @Bash - Sorry I could not be of more help :( – Romain Hippeau Aug 02 '10 at 22:59
  • oh, you were a lot of help...information is power in our field, right? – Bash Aug 03 '10 at 14:07
2

That's a limitation of strtok. The designers had whitespace-separated tokens in mind. strtok doesn't do much anyway; just roll your own parser. The C FAQ has an example.

Gilles 'SO- stop being evil'
  • 104,111
  • 38
  • 209
  • 254
1

Look into using strsep instead: strsep reference

Chris
  • 101
  • 1
  • 4
  • oh well :-) Most of my coding is on UNIX and that sure will come in handy right about now :-)) never heard of it before. – clearlight May 08 '18 at 06:34
1

Use something other than strtok. It's simply not intended to do what you're asking for. When I've needed this, I usually used strcspn or strpbrk and handled the rest of the tokeninzing myself. If you don't mind it modifying the input string like strtok, it should be pretty simple. At least right off, something like this seems as if it should work:

// Warning: untested code. Should really use something with a less-ugly interface.
char *tokenize(char *input, char const *delim) { 
    static char *current;    // just as ugly as strtok!
    char *pos, *ret;
    if (input != NULL)
        current = input;

    if (current == NULL)
        return current;

    ret = current;
    pos = strpbrk(current, delim);
    if (pos == NULL) 
        current = NULL;
    else {
        *pos = '\0';
        current = pos+1;
    }
    return ret;
}
Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
1

Inspired by Patrick Schlüter answer I made this function, it is supposed to be thread safe and support empty tokens and doesn't change the original string

char* strTok(char** newString, char* delimiter)
{
    char* string = *newString;
    char* delimiterFound = (char*) 0;
    int tokLenght = 0;
    char* tok = (char*) 0;

    if(!string) return (char*) 0;

    delimiterFound = strstr(string, delimiter);

    if(delimiterFound){
        tokLenght = delimiterFound-string;
    }else{
        tokLenght = strlen(string);
    }

    tok = malloc(tokLenght + 1);
    memcpy(tok, string, tokLenght);
    tok[tokLenght] = '\0';

    *newString = delimiterFound ? delimiterFound + strlen(delimiter) : (char*)0;

    return tok;
}

you can use it like

char* input = "1,2,3,,5,";
char** inputP = &input;
char* tok;
while( (tok=strTok(inputP, ",")) ){
    printf("%s\n", tok);
}

This suppose to output

1
2
3

5

I tested it for simple strings but didn't use it in production yet, and posted it on code review too, so you can see what do others think about it

Accountant م
  • 6,975
  • 3
  • 41
  • 61
  • 1
    If you're on a Posix machine you can replace the 'tok = malloc(tokLenght + 1); memcpy(tok, string, tokLenght); tok[tokLenght] = '\0';'` simply by `tok = strndup(string, tokLength);` – Patrick Schlüter Apr 30 '19 at 10:58
0

Below is the solution that is working for me now. Thanks to all of you who responded.

I am using LoadRunner. Hence, some unfamiliar commands, but I believe the flow can be understood easily enough.

char strAccInfo[1024], *p2;
int iLoop;

Action() {  //This value would come from the wrsp call in the actual script.
    lr_save_string("323|90||95|95|null|80|50|105|100|45","test_Param");

    //Store the parameter into a string - saves memory. 
    strcpy(strAccInfo,lr_eval_string("{test_Param}"));
    //Get the first instance of the separator "|" in the string
    p2 = (char *) strchr(strAccInfo,'|');

    //Start a loop - Set the max loop value to more than max expected.
    for (iLoop = 1;iLoop<200;iLoop++) { 

        //Save parameter names in sequence.
        lr_param_sprintf("Param_Name","Parameter_%d",iLoop);

        //Get the first instance of the separator "|" in the string (within the loop).
        p2 = (char *) strchr(strAccInfo,'|');           

        //Save the value for the parameters in sequence. 
        lr_save_var(strAccInfo,p2 - strAccInfo,0,lr_eval_string("{Param_Name}"));   

        //Save string after the first instance of p2, as strAccInfo - for looping.
        strcpy(strAccInfo,p2+1);

        //Start conditional loop for checking for last value in the string.
        if (strchr(strAccInfo,'|')==NULL) {
            lr_param_sprintf("Param_Name","Parameter_%d",iLoop+1);
            lr_save_string(strAccInfo,lr_eval_string("{Param_Name}"));
            iLoop = 200;    
        }
    }
}
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Bash
  • 85
  • 1
  • 6
  • At some point, you need to explain why you have global variables instead of local variables, and why you don't have a return type on the function (that's very old style C). Or, better, just fix the code so it compiles cleanly under stringent compiler warnings. The use of `iLoop = 200;` to achieve `break;` is fragile. It is not clear why 200 is used in the loop control anyway. – Jonathan Leffler Aug 29 '16 at 17:09