3

I have a string containing datatypes and addresses of variables. These values are separated by "///" and they are alternating (type /// address /// type /// address ...). The amount of these tuples is not fixed and can vary from execution to execution.

Now my problem is how to process the string in a loop, as strtok needs to be called first with the original string and then with the NULL parameter but in the loop it has to be called twice. So after the first loop strtok is called three times which leads to an uneven count of strtok executions whereas it should be an even count. I tried to solve this problem by processing the first tuple outside the loop (because strtok has to be called with the original string) and process the remaining tuples inside the loop.

char mystring[128];
char seperator[] = "///";
char *part;
int type [128];
int address [128];
number_of_variables = 0;

part = strtok(mystring, seperator);
type[number_of_variables] = (int) atoi(part);
part = strtok(NULL, seperator);
address[number_of_variables] = (int)strtol(part, NULL, 16);

while(part != NULL){
    part = strtok(NULL, seperator);
    type[number_of_variables] = (int) atoi(part);


    part = strtok(NULL, seperator);
    address[number_of_variables] = (int)strtol(part, NULL, 16);

    number_of_variables++;
}

So now I have an even count of strtok executions but if my strings contains for example 2 tuples it will enter the loop for a second time so strtok is called for a fifth time which causes the program to crash as atoi() gets a bad pointer.

EDIT: Example for mystring:

"1///0x37660///2///0x38398"

1 and 2 are type identifiers for the further program.

Gora
  • 61
  • 12
  • Can you provide some example of input strings? – Abhijit Pritam Dutta Mar 01 '18 at 12:44
  • Why aren't you checking the result of `part = strtok(NULL, seperator);` for NULL before calling `atoi`? – lurker Mar 01 '18 at 12:55
  • If you want a robust parser for this try *bison* and *flex*. It's easy to learn, and this is a very nice example to start using it. You would have a very simple lexer and a simple parser with 2 or 3 grammar rules. Take 10 mins to read about bison and flex and you will probably write the whole parser in another 10min. – Iharob Al Asimi Mar 01 '18 at 12:56
  • 1
    You're aware that `strtok` treats the second argument as a *list of single-character delimeters* to look for, not as a *single delimeter that might be a string*? So with delimeters as `"///"` and string as `"1///0x37660/..." it's going to see a 1, then NULL, then NULL, then "0x37660". See the manual page for `strtok`. – lurker Mar 01 '18 at 13:11
  • 1
    Adding to @lurker's comment, you might want to use `strstr()` which unlike `strtok()` is, reentrant, works with constant strings (*it doesn't modify it's arguments*) and you can have any string as a delimiter. – Iharob Al Asimi Mar 01 '18 at 13:14
  • @lurker Thanks for this insight, i didn't know that and changed the separator. But it won't change my initial problem will it ? I am using Vlad from Moscow's solution right now – Gora Mar 01 '18 at 13:33
  • 1
    Yes, Vlad's solution will work since he properly handles `strtok`. – lurker Mar 01 '18 at 13:41

2 Answers2

2

I can suggest the following loop as it is shown in the demonstrative program below.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) 
{
    char mystring[128] = "1///0x37660///2///0x38398";
    char separator[] = "/ ";
    int type [128];
    int address [128];

    size_t number_of_variables = 0;

    for ( char *part = strtok( mystring, separator ); part; part = strtok( NULL, separator ) )
    {
        type[number_of_variables] = atoi(part);
        part = strtok( NULL, separator );
        address[number_of_variables] = part ? (int)strtol(part, NULL, 16) : 0;
        ++number_of_variables;
    }

    for ( size_t i = 0; i < number_of_variables; i++ )
    {
        printf( "%d\t%x\n", type[i], address[i] );
    }

    return 0;
}

The program output is

1   37660
2   38398
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
  • Your original string contains 5 values. So it's not a tuple string, as the original string has to contain an even number of values – Gora Mar 01 '18 at 12:56
  • 1
    @Gora It does not matter. – Vlad from Moscow Mar 01 '18 at 12:57
  • It works with your solution even though i don't understand it. Especially the ? operator and the :0 I don't unterstand – Gora Mar 01 '18 at 13:27
  • 2
    @Gora It seems you mean the conditional or in other words ternary operator. If strtok returns NULL then by default the value of an element of the array address is set to 0. Instead of 0 you can select any other value or signal an error. – Vlad from Moscow Mar 01 '18 at 13:29
  • 1
    @Gora They are called ternary operators in C. – Gaurav Pathak Mar 01 '18 at 13:29
  • @Gora YOU are welcome to meet the ugliest Operator in C —>> The Ternary Operator – Michi Mar 01 '18 at 13:34
2

You can write a robust and fast parser, that is guaranteed to work and has no bugs like this

File: lexer.l

%{
#include <stdio.h>
#include "parser.tab.h"
int yyerror(const char *const message);
%}

%option noyywrap
%x IN_ADDRESS

DECIMAL [0-9]+
HEX "0x"[a-fA-F0-9]+
DELIMITER "///"

%%

<*>{DELIMITER} { return DELIMITER; }

<INITIAL>{DECIMAL} {
        char *endptr;
        // Make the lexer know that we are expecting a
        // hex number
        BEGIN(IN_ADDRESS);
        // Asign the value to use by bison
        yylval = strtol(yytext, &endptr, 10);
        // Check conversion's success
        if (*endptr != '\0')
            return ERROR;
        return TYPE;
    }

<IN_ADDRESS>{HEX} {
        char *endptr;
        // Restore the initial state
        BEGIN(INITIAL);
        // Asign the value to use by bison
        yylval = strtol(yytext, &endptr, 16);
        // Check conversion's success
        if (*endptr != '\0')
            return ERROR;
        return ADDRESS;
    }

%%

File: parser.y

%{
#include <stdio.h>
extern int yylex();
extern FILE *yyin;

int yyerror(const char *const message);

#define YYSTYPE int
%}


%token TYPE
%token DELIMITER
%token ADDRESS
%token ERROR

%%

program:
       | program statement
       ;

command: TYPE DELIMITER ADDRESS { 
           fprintf(stdout, "type %d, address 0x%08x\n", $1, $3); 
       }
       ;

statement: command
         | statement DELIMITER command;
         ;

%%

int
yyerror(const char *const message)
{
    return fprintf(stdout, "error: %s\n", message);
}

int
main(void)
{
    yyin = fopen("program.txt", "r");
    if (yyin == NULL)
        return -1;
    yyparse();
}

File: program.txt

1///0x37660///2///0x38398

Compiling this with gcc, bison and flex is rather simple

bison -d parser.y
flex lexer.l
gcc -Wno-unused-function -Wall -Werror lex.yy.c parser.tab.c -o parserparser

Of course, this program needs some tweaking and adjusting it to your needs should be straightforward.

Just find a simple tutorial on bison and flex to help you fully understand this code.

Iharob Al Asimi
  • 52,653
  • 6
  • 59
  • 97
  • Thank you for this detailed solution. Unfortunately my code is running in a Matlab S-Function of a bigger project, so I'm not able to change the compiler or the project guidelines. I don't love the current solution and I'm unsure if there are some cases where it won't work (even though I didn't find any yet) but my hands are tied ;) – Gora Mar 01 '18 at 14:19
  • @Gora I see, it's still not hard to find other parser generator that would work in your situation. Although I don't know matlab at all. – Iharob Al Asimi Mar 01 '18 at 14:24
  • This is probably overkill for the problem at hand. – Jonathan Leffler Mar 01 '18 at 15:08