0

During our compiler's intermediate code generation phase, and more specifically while testing the arithmetic expressions and assignment rules, I noticed that although the respective quads are constructed successfully, when printing them out sometimes we'll get a bad_alloc exception. After tracing it, it looks like it's cause by the printQuads() method and specifically the following string access of key:

if(q.result != nullptr && q.result->sym != nullptr) {
    cout << "quad " << opcodeStrings[q.op] << " inside if key check for" << opcodeStrings[q.op] << endl;
    resultKey = q.result->sym->key;
}

I'll try to include the code that's relevant instead of dumping 500 lines of code here. So, below you can see our assignmentexpr and basic arithmetic expression rules and actions:

expr:                           assignexpr
                            |   expr PLUS expr
                                {
                                    bool isExpr1Arithm = check_arith($1);
                                    bool isExpr2Arithm = check_arith($3);
                                    if(!isExpr1Arithm || !isExpr2Arithm)
                                    {
                                        //string msg = !isExpr1Arithm ? "First operand isn\'t a number in addition!" : "Second operand isn\'t a number in addition!";
                                        yyerror(token_node, "Both addition operands must be numbers!");
                                    } else
                                    {
                                        double result = $1->numConst + $3->numConst;
                                        $$ = newexpr(arithmetic_e);
                                        $$->sym = newtemp(scope);
                                        $$->numConst = result;
                                        emit(add, $1, $3, $$, nextquadlabel(), yylineno);
                                    }
                                }
                            |   expr MIN expr
                                {
                                    bool isExpr1Arithm = check_arith($1);
                                    bool isExpr2Arithm = check_arith($3);
                                    if(!isExpr1Arithm || !isExpr2Arithm)
                                    {
                                        //string msg = !isExpr1Arithm ? "First operand isn\'t a number in subtraction!" : "Second operand isn\'t a number in subtracion!";
                                        yyerror(token_node, "Both suctraction operands must be numbers!");
                                    } else
                                    {
                                        double result = $1->numConst - $3->numConst;
                                        $$ = newexpr(arithmetic_e);
                                        $$->sym = newtemp(scope);
                                        $$->numConst = result;
                                        emit(sub, $1, $3, $$, nextquadlabel(), yylineno);
                                    }
                                }
                            |   expr MUL expr
                                {
                                    bool isExpr1Arithm = check_arith($1);
                                    bool isExpr2Arithm = check_arith($3);
                                    if(!isExpr1Arithm || !isExpr2Arithm)
                                    {
                                        //string msg = !isExpr1Arithm ? "First operand isn\'t a number in subtraction!" : "Second operand isn\'t a number in subtracion!";
                                        yyerror(token_node, "Both multiplication operands must be numbers!");
                                    } else
                                    {
                                        double result = $1->numConst * $3->numConst;
                                        $$ = newexpr(arithmetic_e);
                                        $$->sym = newtemp(scope);
                                        $$->numConst = result;
                                        emit(mul, $1, $3, $$, nextquadlabel(), yylineno);
                                    }
                                }
                            |   expr DIV expr
                                {
                                    bool isExpr1Arithm = check_arith($1);
                                    bool isExpr2Arithm = check_arith($3);
                                    if(!isExpr1Arithm || !isExpr2Arithm)
                                    {
                                        //string msg = !isExpr1Arithm ? "First operand isn\'t a number in subtraction!" : "Second operand isn\'t a number in subtracion!";
                                        yyerror(token_node, "Both division operands must be numbers!");
                                    } else
                                    {
                                        if($3->numConst == 0) {
                                            yyerror(token_node, "division by 0!");
                                        } else {
                                            double result = $1->numConst / $3->numConst;
                                            $$ = newexpr(arithmetic_e);
                                            $$->sym = newtemp(scope);
                                            $$->numConst = result;
                                            emit(div_op, $1, $3, $$, nextquadlabel(), yylineno);
                                        }
                                    }
                                }
                            |   expr MOD expr
                                {
                                    bool isExpr1Arithm = check_arith($1);
                                    bool isExpr2Arithm = check_arith($3);
                                    if(!isExpr1Arithm || !isExpr2Arithm)
                                    {
                                        //string msg = !isExpr1Arithm ? "First operand isn\'t a number in subtraction!" : "Second operand isn\'t a number in subtracion!";
                                        yyerror(token_node, "Both modulus operands must be numbers!");
                                    } else
                                    {
                                        if($3->numConst == 0) {
                                            yyerror(token_node, "division by 0!");
                                        } else {
                                            double result = fmod($1->numConst,$3->numConst);
                                            $$ = newexpr(arithmetic_e);
                                            $$->sym = newtemp(scope);
                                            $$->numConst = result;
                                            emit(mod_op, $1, $3, $$, nextquadlabel(), yylineno);
                                        }
                                    }
                                }
...


assignexpr:                     lvalue ASSIGN expr  {   if ( isMemberOfFunc )
                                                        {
                                                            isMemberOfFunc=false;
                                                        }
                                                        else{   if ( islocalid==true ){
                                                                    islocalid = false;
                                                                }else{
                                                                    if ( isLibFunc($1->sym->key) ) yyerror(token_node,"Library function \"" + $1->sym->key + "\" is not lvalue!");
                                                                    if (SymTable_lookup(symtab,$1->sym->key,scope,false) && isFunc($1->sym->key,scope)) yyerror(token_node,"User function \"" + $1->sym->key + "\" is not lvalue!");
                                                                }
                                                        }
                                                        if($1->type == tableitem_e)
                                                        {
                                                            // lvalue[index] = expr
                                                            emit(tablesetelem,$1->index,$3,$1,nextquadlabel(),yylineno);
                                                            $$ = emit_iftableitem($1,nextquadlabel(),yylineno, scope);
                                                            $$->type = assignment;
                                                        } else
                                                        {
                                                            emit(assign,$3,NULL,$1,nextquadlabel(),yylineno); //lval = expr;
                                                            $$ = newexpr(assignment);
                                                            $$->sym = newtemp(scope);
                                                            emit(assign, $1,NULL,$$,nextquadlabel(),yylineno);
                                                        }
                                                    }
                            ;

The printQuads method is the following:

void printQuads() {
unsigned int index = 1;
cout << "quad#\t\topcode\t\tresult\t\targ1\t\targ2\t\tlabel" <<endl;
cout << "-------------------------------------------------------------------------------------------------" << endl;
for(quad q : quads) {
    string arg1_type = "";
    string arg2_type = "";
    cout << "quad before arg1 type check" << endl;
    if(q.arg1 != nullptr) {
        switch (q.arg1->type) {
            case const_bool:
                arg1_type = "\'" + BoolToString(q.arg1->boolConst) + "\'";
                break;
            case const_string:
                arg1_type = "\"" + q.arg1->strConst + "\"";
                break;
            case const_num:
                arg1_type = to_string(q.arg1->numConst);
                break;
            case var:
                arg1_type = q.arg1->sym->key;
                break;
            case nil_e:
                arg1_type = "nil";
                break;
            default:
                arg1_type = q.arg1->sym->key;
                break;
        }
    }
    cout << "quad before arg2 type check" << endl;
    if(q.arg2 !=  nullptr) {
        switch (q.arg2->type) {
            case const_bool:
                arg2_type = "\'" + BoolToString(q.arg2->boolConst) + "\'";
                break;
            case const_string:
                arg2_type = "\"" + q.arg2->strConst + "\"";
                break;
            case const_num:
                arg2_type = to_string(q.arg2->numConst);
                break;
            case nil_e:
                arg2_type = "nil";
                break;
            default:
                arg2_type = q.arg2->sym->key;
                break;
        }
    }
    string label = "";
    if(q.op == if_eq || q.op == if_noteq || q.op == if_lesseq || q.op == if_greatereq
        || q.op == if_less || q.op == if_greater || q.op == jump) label = q.label;

    string resultKey = "";
    cout << "quad before key check" << endl;
    if(q.result != nullptr && q.result->sym != nullptr) {
        cout << "quad " << opcodeStrings[q.op] << " inside if key check for" << opcodeStrings[q.op] << endl;
        resultKey = q.result->sym->key;
    }
    cout << "quad after key check" << endl;
    cout << index << ":\t\t" << opcodeStrings[q.op] << "\t\t" << resultKey << "\t\t" << arg1_type << "\t\t" << arg2_type << "\t\t" << label << "\t\t" << endl;
    index++;
}
}

The quads variable is just a vector of quads. Here is the quad struct:

enum expr_t {
var,
tableitem_e,
user_func,
lib_func,
arithmetic_e,
assignment,
newtable_e,
const_num,
const_bool,
const_string,
nil_e,
bool_e
};

struct expr {
    expr_t type;
    binding* sym;
    expr* index;
    double numConst;
    string strConst;
    bool boolConst;
    expr* next;
};

struct quad {
    iopcode op;
    expr* result;
    expr* arg1;
    expr* arg2;
    unsigned int label;
    unsigned int line;
};

The binding* is defined as follows and is a symbol table binding:

enum SymbolType{GLOBAL_, LOCAL_, FORMAL_, USERFUNC_, LIBFUNC_, TEMP};

struct binding{
    std::string key;
    bool isactive = true;
    SymbolType sym;
    //vector<binding *> formals;
    scope_space space;
    unsigned int offset;
    unsigned int  scope;
    int line;
};

Here are the emit(), newtemp & newexpr() methods:

void emit(
        iopcode         op,
        expr*           arg1,
        expr*           arg2,
        expr*           result,
        unsigned int    label,
        unsigned int    line
    ){
    quad p;
    p.op            = op;
    p.arg1          = arg1;
    p.arg2          = arg2;
    p.result        = result;
    p.label         = label;
    p.line          = line;
    currQuad++;
    quads.push_back(p);
}

binding *newtemp(unsigned int scope){
    string name = newTempName();
    binding* sym = SymTable_get(symtab,name,scope);
    if (sym== nullptr){
        SymTable_put(symtab,name,scope,TEMP,-1);
        binding* sym =  SymTable_get(symtab,name,scope);
        return sym;
    }else return sym;
}

string newTempName(){
    string temp = "_t" + to_string(countertemp) + " ";
    countertemp++;
    return temp;
}

expr* newexpr(expr_t exprt){
    expr* current = new expr;
    current->sym = NULL;
    current->index = NULL;
    current->numConst = 0;
    current->strConst = "";
    current->boolConst = false;
    current->next = NULL;
    current->type = exprt;
    return current;
}

unsigned int countertemp = 0;
unsigned int currQuad = 0;

Symbol table cpp file:

#include <algorithm>
bool isHidingBindings = false;

/* Return a hash code for pcKey.*/
static unsigned int SymTable_hash(string pcKey){
  size_t ui;
  unsigned int uiHash = 0U;
  for (ui = 0U; pcKey[ui] != '\0'; ui++)
    uiHash = uiHash * HASH_MULTIPLIER + pcKey[ui];
  return (uiHash % DEFAULT_SIZE);
}

/*If b contains a binding with key pcKey, returns 1.Otherwise 0.
It is a checked runtime error for oSymTable and pcKey to be NULL.*/
int Bucket_contains(scope_bucket b, string pcKey){
    vector<binding> current = b.entries[SymTable_hash(pcKey)]; /*find the entry binding based on the argument pcKey*/
    for (int i=0; i<current.size(); i++){
        binding cur = current.at(i);
        if (cur.key==pcKey) return 1;
    }   
    return 0;
}

/*epistrefei to index gia to bucket pou antistixei sto scope 'scope'.Se periptwsh pou den uparxei
akoma bucket gia to en logw scope, ean to create einai true dhmiourgei to antistoixo bucket sto
oSymTable kai epistrefei to index tou.Diaforetika epistrefei thn timh -1.*/
int indexofscope(SymTable_T &oSymTable, unsigned int scope, bool create){
    int index=-1;
    for(int i=0; i<oSymTable.buckets.size(); i++) if (oSymTable.buckets[i].scope == scope) index=i;
    if ( index==-1 && create ){
        scope_bucket newbucket;
        newbucket.scope = scope;
        oSymTable.buckets.push_back(newbucket);
        index = oSymTable.buckets.size()-1;
    }
    return index;
}

/*If there is no binding with key : pcKey in oSymTable, puts a new binding with
this key and value : pvvValue returning 1.Otherise, it just returns 0.
It is a checked runtime error for oSymTable and pcKey to be NULL.*/
int SymTable_put(SymTable_T &oSymTable, string pcKey,unsigned int scope, SymbolType st, unsigned int line){
    int index = indexofscope(oSymTable,scope, true);
    if(index==-1) cerr<<"ERROR"<<endl;
    scope_bucket *current = &oSymTable.buckets.at(index);
    if ( Bucket_contains(*current, pcKey) && st != FORMAL_ && st != LOCAL_) return 0; /*If the binding exists in oSymTable return 0.*/
    binding newnode;
    newnode.key = pcKey;
    newnode.isactive = true;
    newnode.line =  line;
    newnode.sym = st;
    newnode.scope = scope;
    current->entries[SymTable_hash(pcKey)].push_back(newnode);
    return 1;
}

/*Pairnei ws orisma to oSymTable kai to scope pou theloume na apenergopoihsoume.
An to sugkekrimeno scope den uparxei sto oSymTable epistrefei -1.Diaforetika 0*/
void SymTable_hide(SymTable_T &oSymTable, unsigned int scope){
    isHidingBindings = true;
    for(int i=scope; i >= 0; i--) {
        if(i == 0) return;
        int index = indexofscope(oSymTable,i,false);
        if(index == -1) continue;
        scope_bucket *current = &oSymTable.buckets.at(index);
        for (int i=0; i<DEFAULT_SIZE; i++) {
            for (int j=0; j<current->entries[i].size(); j++) {
                if(current->entries[i].at(j).sym == LOCAL_ || current->entries[i].at(j).sym == FORMAL_) 
                    current->entries[i].at(j).isactive = false;
            }
        }
    }
}

void SymTable_show(SymTable_T &oSymTable, unsigned int scope){
    isHidingBindings = false;
    for(int i=scope; i >= 0; i--) {
        if(i == 0) return;
        int index = indexofscope(oSymTable,i,false);
         if(index == -1) continue;
        scope_bucket *current = &oSymTable.buckets.at(index);
        for (int i=0; i<DEFAULT_SIZE; i++) {
            for (int j=0; j<current->entries[i].size(); j++) {
                if(current->entries[i].at(j).sym == LOCAL_ || current->entries[i].at(j).sym == FORMAL_) 
                    current->entries[i].at(j).isactive = true;
            }
        }
    }
}

bool SymTable_lookup(SymTable_T oSymTable, string pcKey, unsigned int scope, bool searchInScopeOnly){
    for(int i=scope; i >= 0; i--) {
        if(searchInScopeOnly && i != scope) break;
        int index = indexofscope(oSymTable,i,false);
         if(index == -1) continue;
        scope_bucket current = oSymTable.buckets[index];
        for(vector<binding> entry : current.entries) {
            for(binding b : entry) {
                if(b.key == pcKey && b.isactive) return true;
                else if(b.key == pcKey && !b.isactive) return false;
            }
        }
    }
    return false;
}

binding* SymTable_lookupAndGet(SymTable_T &oSymTable, string pcKey, unsigned int scope) noexcept{
    for ( int i=scope; i >= 0; --i ){
        int index = indexofscope(oSymTable,i,false );
        if (index==-1) continue;
        scope_bucket &current = oSymTable.buckets[index];
        for (auto &entry : current.entries) {
            for (auto &b : entry ){
                if ( b.key == pcKey ) return &b;
            }
        }
    }
    return nullptr;
}

/*Lamvanei ws orisma to oSymTable, kleidh tou tou desmou pou psaxnoume kai to scope tou desmou.
H sunarthsh telika epistrefei to value tou tou desmou.Diaforetika epistrefei 0*/
binding* SymTable_get(SymTable_T &oSymTable, const string pcKey, unsigned int scope){
    for ( int i=scope; i >= 0; --i )
    {
        const int index = indexofscope( oSymTable, i, false );
        if ( index == -1 )
        {
            continue;
        }

        scope_bucket& current = oSymTable.buckets[index];

        for ( auto& entry : current.entries)
        {
            for ( auto& b : entry )
            {
                if ( b.key == pcKey )
                {
                    return &b;
                }
            }
        }
    }
    return nullptr;
}

When run with the following test file, the issue occurs at the z5 = 4 / 2; expression's assign quad:

// simple arithmetic operations
z1 = 1 + 2;
z10 = 1 + 1;
z2 = 1 - 3;
z3 = 4 * 4;
z4 = 5 / 2;

What's confusing is that if I print out the sym->key after each emit() in the arithmetic-related actions, I can see the keys just fine. But once I try to access them inside the printQuads it will fail (for the div operation at least so far). This has me thinking that maybe we are shallow copying the binding* sym thus losing the key? But how come the rest of them are printed normally?

I'm thinking that the issue (which has occured again in the past at various stages) could be caused by us using a ton of copy-by-value instead of by-reference but I can't exactly confirm this because most of the time it works (I'm guessing that means that this is undefined behavior?).

I'm sure this is very difficult to help debug but maybe someone will eyeball something that I can't see after this many hours.

Stelios Papamichail
  • 955
  • 2
  • 19
  • 57

1 Answers1

1

Debugging by eyeballing your code is probably a useful skill, but it's far from the most productive form of debugging. These days, it's much less necessary, since there are lots of good tools which you can use to detect problems. (Here, I do mean "you", specifically. I can't use any of those tools because I don't have your complete project in front of me. And nor do I particularly want it; this is not a request for you to paste hundreds of lines of code).

You're almost certainly right that your problem is related to some kind of undefined behaviour. If you're correct about the bad_alloc exception being thrown by what is effectively a copy of a std::string, then it's most likely the result of the thing being copied from not being a valid std::string. Perhaps it's an actual std::string object whose internal members have been corrupted; perhaps the pointer is not actually pointing to an active std::string (which I think is the real problem, see below). Or perhaps it's something else.

Either way, the error occurred long before the bug manifests itself, so you're only going to stumble upon where it happened by blind luck. On the other hand, there are a variety of memory error detection tools available which may be able to pinpoint the precise moment in which you violated the contract by reading or writing to memory which didn't belong to you. These include Valgrind and AddressSanitizer (also known as ASan); one or both of these is certainly available for the platform on which you are developing your project. (I say that confidently even without knowing what that platform is, but you'll have to do a little research to find the one which works best for your particular environment. Both of those names can be looked up on Wikipedia.) These tools are very easy to use, and extraordinarily useful; they can save you hours or days of debugging and a lot of frustration. As an extra added bonus, they can detect bugs you don't even know you have, saving you the embarrassment of shipping a program which will blow up in the hands of the customer or the person who is marking your assignment. So I strongly recommend learning how to use them.

I probably should leave it at that, because it's better motivation to learn to use the tools. Still, I can't resist making a guess about where the problem lies. But honestly, you will learn a lot more by ignoring what I'm about to say and trying to figure out the problem yourself.

Anyway, you don't include much in the way of information about your SymTable_T class, and the inconsistent naming convention makes me wonder if you even wrote its code; perhaps it was part of the skeleton code you were given for this assignment. From what I can see in SymTable_put and SymTable_get, the SymTable_T includes something like a hash table, but doesn't use the C++ standard library associative containers. (That's a mistake from the beginning, IMHO. This assignment is about learning how to generate code, not how to write a good hash table. The C++ standard library associative containers are certainly adequate for your purposes, whether or not they are the absolute ideal for your use case, and they have the enormous advantages of already being thoroughly documented and debugged.)

It's possible that SymTable_T was not originally written in C++ at all. The use of free-standing functions like SymTable_put and SymTable_get rather than class methods is difficult to explain unless the functions were originally written in C, which doesn't allow object methods. On the other hand, they appear to use C++ standard library collections, as evidenced by the call to push_back in SymTable_put:

current->entries[SymTable_hash(pcKey)].push_back(newnode);

That suggests that entries is a std::vector (although there are other possibilities), and if it is, it should raise a red flag when you combine it with this, from SymTable_get (whitespace-edited to save screen space here):

for ( auto& entry : current.entries) {
    for ( auto& b : entry ) {
        if ( b.key == pcKey )
            return &b;
    }
}

To be honest, I don't understand that double loop. To start with, you seem to be ignoring the fact that there is a hash table somewhere in that data structure, but beyond that, it seems to me that entry should be a binding (that's what SymTable_put pushes onto the entries container), and I don't see where a binding is an iterable object. Perhaps I'm not reading that correctly.)

Regardless, evidently SymTable_get is returning a reference to something which is stored in a container, probably a std::vector, and that container is modified from time to time by having new elements pushed onto it. And pushing a new element onto the end of a std::vector invalidates all existing references to every element of the vector. (See https://en.cppreference.com/w/cpp/container/vector/push_back)

Thus, newtemp, which returns a binding* acquired from SymTable_get, is returning a pointer which may be invalidated in the future by some call to SymTable_put (though not by every call to that function; only the ones where the stars unline unhappily). That pointer is then stored into a data object which will (much later) be given to printQuads, which will attempt to use the pointer to make a copy of a string which it will attempt to print. And, as I mentioned towards the beginning of this treatise, trying to use an object which is pointed to by a dangling pointer is Undefined Behaviour.

As a minor note, making a copy of a string in order to print it out is completely unnecessary. A reference would work just fine, and save a bunch of unnecessary memory allocations. But that won't fix the problem (if my guess turns out to be correct) because printing through a dangling pointer is just as Undefined Behaviour as making a copy through a dangling pointer, and will likely manifest in some other mysterious way.

rici
  • 234,347
  • 28
  • 237
  • 341
  • First and foremost, thank you for taking the time to understand our code and to write this thoughtful answer back! You are absolutely right about almost everything. The symbol table is basically an array of hash tables based on scope (I hope that explains the double loop), it was written by my partner a few years back in C (as you correctly guessed) and sadly we obviously failed miserably converting it into C++ code correctly (we are not allowed to use built-in hash tables/symbol tables (don't know why TBH)). We'll definitely use Valgrind again! – Stelios Papamichail May 13 '22 at 21:58
  • Your backtracking regarding the reference returned by `SymTable_get` is also something that we thought of but couldn't quite work around. Since our bindings are stored inside vector containers, what would work in this case? Could we instead return a pointer to the vector element? P.S. I've added the rest of the symtable code just for reference. – Stelios Papamichail May 13 '22 at 22:03
  • 1
    @stelios: simple solution is to use std::deque instead of std:: vector. It's got a much stronger reference integrity guarantee. (Invalidated on erase but not on push.) Vectors maintain contiguity so they can't grow without moving. Deques are lists of smallish chunks, so they can grow by adding a new chunk. The chunks are indexed so they're still O(1) access. – rici May 13 '22 at 22:07
  • 1
    @stelios: but watch out for popping an entire scope, which will destroy all of its names. Basically, every time you persist a reference/pointer, you need to think about the lifetime of the object pointed at and that of the object which holds the pointer. Get in the habit or use a different language (Go or Rust, for example, with different solutions). – rici May 13 '22 at 22:21
  • Thank you for the suggestions and the pointers (pun intended). I'm obviously a very bad C "programmer" and I have close to 0 experience with it. We also rarely use it (only in few programming subjects) so I definitely don't have the habit of thinking pointer-related issues through. I come from languages such as Kotlin and Java so the transition is quite big but I'll do my best to try and improve the code that we already have, along with my friend. – Stelios Papamichail May 13 '22 at 22:25
  • 1
    @Stelios: That wasn't the message I hoped to convey. Actually, I think you probably could be a good programmer; otherwise, I wouldn't have spent the time writing that answer. You are inexperienced in C and C++, but it's clear how to fix that; it just takes a bit of time. Whether or not you're using a language with explicit pointers, it's important to think about object lifetimes. That's true even if you're using a language with a garbage collector, and its true in spades if you want to *create* a language. Coding in C is a nuisance but it's good discipline. So take advantage. – rici May 14 '22 at 04:04
  • 1
    But the most important take-away here is this: you need to learn how to debug. Unfortunately, that's not taught very well anywhere that I know of, so you need to figure it out for yourself. And it's way more difficult than writing code. So whenever you're writing something, think about how you're going to debug it. When you're selecting toolsets (or writing toolsets), think about how they simplify or impede debugging. And learn how to use as many debugging tools as possible. (For example, bison has an extremely handy feature which can trace the parser.) Good luck with the project. – rici May 14 '22 at 04:08