0

I am writing an llvm code generation demo for a certain language which includes if statement. Here are the rules and the actions corresponding to my question:

IfStatement : IF CondExpression THEN Statement                      {if_Stmt(string($2),string($4));}                     %prec LOWER_THAN_ELSE ;  
            | IF CondExpression THEN Statement ELSE Statement       {if_else_Stmt(string($2),string($4),string($6));} 
            ;            

CondExpression : Expression Relop Expression            { $$ = operation($2,string($1),string($3));printf("Relop value : %s \n",$2);}
                | Expression                            {$$ = $1;}
               ;

Relop :     EE                  {$$ = (char *)(string("icmp eq ").c_str());printf("%s\n",$$);}                  
      | NE                  {$$ = (char *)(string("icmp ne ").c_str());} 
      | LT                  {$$ = (char *)(string("icmp slt ").c_str());} 
      | GT                  {$$ = (char *)(string("icmp sgt ").c_str());} 
      | LTE                 {$$ = (char *)(string("icmp sle ").c_str());}  
      | GTE                 {$$ = (char *)(string("icmp sge ").c_str());} 
          ;

The CondExpression rule should parse the conditional expression. I am using print function to print the value of Relop token which is of type < char * >. The Relop should have the value of the conditional tokens inside the string function as shown above in the code. However, the result of the print function is 0

 Relop value : 0

and the result of the second print inside Relop is correct,

Relop value : icmp eq 

why the Relop value in the CondExpression is 0 and how to make it take the correct value returned from Relop rule.

Rational Rose
  • 73
  • 1
  • 10

1 Answers1

1

Not only is

(char *)(string("icmp ne ").c_str()

an absurdly obfuscated way of writing

"icmp ne"

it also introduces Undefined Behaviour not present in the simple and obvious alternative. The string constructor creates and returns a temporary string, and c_str is then used to return a pointer to internal storage of that temporary. You then store that pointer into the parser stack and let the temporary be deconstructed, orphaning the stored pointer. So when you attempt to print the string, you are passing a dangling pointer and anything might happen, such as the memory being reused for some other object leading to a mysterious string being printed.

Of course, if your semantic type is char *, C++ will complain that $$ = "icmp eq"; is not const-safe. It's not immediately obvious to me why you wouldn't use char *const as the semantic type, unless some other part of your code either intends to modify the string or may need to free the memory (because in some cases the string was dynamically allocated). In that case, you could force a copy of the string using, for example, strdup. If your library doesn't provide strdup or you don't want to rely on that, it can easily be defined as something like

char* strdup(const char* s, size_t len=strlen(s)) {
  char* r = malloc(len + 1);
  memcpy(r, s, len);
  r[len] = 0;
  return r;
}

Although a more C++-like solution would be to use std::string* as the semantic type, allowing you to write:

$$ = new std::string("icmp eq");
rici
  • 234,347
  • 28
  • 237
  • 341
  • these are the strings for equal , not equal , less than , greater than ..etc :http://releases.llvm.org/2.7/docs/LangRef.html . I asked to find a solution not to create a problem – Rational Rose Jan 12 '17 at 18:43
  • 1
    @rational: i understand what the strings mean. The solution is to fix the problem I identified. Either use a string literal or a copy of a string literal (if for some reason you need a copy.) But don't create a dangling pointer. – rici Jan 12 '17 at 20:06