2

So I'm building a virtual machine, and trying to make it as cross platform as possible, and suddenly encountering a strange error. There is a let instruction for my machine, which allocates memory for a variable in the memory of the machine and assign that variable with a value. In short, the let function calls getAddress to get the address of the variable. getAddress checks if the variable is already defined, and returns the address. If the variable is not defined, getAddress calls memallocate to allocate memory for the variable, and returns the address. Here is the definition of the functions :

static uint16_t memallocate(Machine *m, char *symbol){
    uint16_t allocationAddress = getFirstFree(*m);
    SymbolTable *newSymbol = (SymbolTable *)malloc(sizeof(SymbolTable));
    newSymbol->symbolName = strdup(symbol);
    newSymbol->next = NULL;
    newSymbol->mema = allocationAddress;
    if(m->symbolTable==NULL){
        m->symbolTable = newSymbol;
    }
    else{
        SymbolTable *temp = m->symbolTable;
        while(temp->next!=NULL)
            temp = temp->next;
        temp->next = newSymbol;
    }
    m->memory[allocationAddress].acquired = 1;
    m->memory[allocationAddress].data.value = 0;
    m->occupiedAddress++;
    return allocationAddress;
}

uint16_t getAddress(Machine *m, char *symbol){
    SymbolTable *table = m->symbolTable;
    while(table!=NULL){
        if(strcmp(symbol, table->symbolName)==0){
            return table->mema;
        }
        table = table->next;
    }
    uint16_t address = memallocate(m, symbol); // Here is the segfault happening
    return address;
}

This code compiles and runs pretty well on Linux, but on Windows I'm getting a segfault on the memallocate call. Since memallocate is directly passed the arguments of getAddress, and the arguments both being a pointer, they shouldn't change. But while debugging through CLion, I'm seeing gibberish arguments to the memallocate call, which is indicating some kind of stack violation(may be). Again, it is ONLY happening in Windows. Can anybody tell me what is going wrong with my code? Full code for the project can be found at GitHub.

Subhranil
  • 851
  • 1
  • 8
  • 23
  • 1
    16-bit addresses? – Martin James Aug 17 '17 at 18:26
  • Yup. The machine also simualtes a RAM-like form of storage. This is the address of the RAM of the machine. – Subhranil Aug 17 '17 at 18:36
  • Have you checked what is on the stack when the call executes? – odin Aug 17 '17 at 18:41
  • Are you talking about `bt` in gdb? Then yes @odin – Subhranil Aug 17 '17 at 18:42
  • Well I meant more like examining the area around the stack pointer to see if you can find the correct arguments anywhere on the stack. That way you can figure out if the problem is that the stack pointer is at the wrong place. Is this being compiled as 64 bit, 32 bit, or 16 bit on windows? – odin Aug 17 '17 at 18:47
  • If there's something wrong with the code presented I'm not seeing it, but nobody knows what your arguments or structs are or what `getFirstFree` does. You should check to make sure `malloc` doesn't return NULL. I suspect the root of the problem originates elsewhere. The fact that its runs fine on that system but breaks on this one is a telltale sign of UB. – yano Aug 17 '17 at 18:47
  • @yano not even the first statement inside `memallocate` is executing. The execution stops right after the `memallocate` call, as shown in the debugger. For the function definition, you can check machine.c in the GitHub link, if that's convenient. – Subhranil Aug 17 '17 at 18:51
  • as soon as you try to step into `memallocate` it crashes? – yano Aug 17 '17 at 18:53
  • That sounds like a corrupted stack pointer, or a stack overflow (full stack). – ikegami Aug 17 '17 at 18:53
  • 64bit in Windows, with cmake 3.7, using MinGW. Wokring well on Arch Linux and Android with clang 4.0.1. And how to check the SP manually? @odin – Subhranil Aug 17 '17 at 18:55
  • If you are using gdb you can just do x /20x $rsp to examine the area around the stack pointer. If you are compiling in 64 bit though, the arguments should be in %rsi and %rdi anyway not on the stack. – odin Aug 17 '17 at 18:59
  • exactly that is @yano – Subhranil Aug 17 '17 at 19:07
  • alright, `memallocate` has nothing to do with it then (unless you're calling it from somewhere else). Just don't think there's enough code presented here to identify the problem. I'd start with a close examination of `m` and `symbol` and make sure everything there looks good (particularly `symbolTable` and all the `symbolName`s). Sorry I don't have time to go digging through your whole project.. good luck! – yano Aug 17 '17 at 19:18
  • No no not that @yano. Told ya, all the arguments are alright till the `getAddress` call. Just when the `memallocate` is invoked, they are messing up, and the program segfaults. Nothing inside memallocate runs. I'll try to put some more code, atleast the functions `bt` shows. Thanks. – Subhranil Aug 17 '17 at 19:24
  • in `getAddress()` check the m, symbol before passing them to `memallocate()`. they must not be `NULL` – EsmaeelE Aug 17 '17 at 22:15
  • They are not NULL before passing to `memallocate`. Rather, when `memallocate` is called, they are becoming gibberish. @EsmaeelE – Subhranil Aug 18 '17 at 04:57
  • There must be some way for `memallocate()` to return the equivalent of a NULL pointer. – joop Aug 18 '17 at 11:27
  • Thanks everybody for your valuable insights. The problem was with the `Machine` struct, which was too large to keep in stack along with several levels of function calls, as I declared it locally in `main()`. All I did was performing a `malloc` for the structure on the heap instead of local declaration, and it resolved. – Subhranil Aug 19 '17 at 11:03

1 Answers1

1

I took your code and run it on linux through valgrind:

==13768== Conditional jump or move depends on uninitialised value(s)
==13768==    at 0x109ABE: getAddress (in /home/vonaka/VirtualMachine/machine)
==13768==    by 0x10B714: let (in /home/vonaka/VirtualMachine/machine)
==13768==    by 0x109425: run (in /home/vonaka/VirtualMachine/machine)
==13768==    by 0x109F64: main (in /home/vonaka/VirtualMachine/machine)
==13768==  Uninitialised value was created by a heap allocation
==13768==    at 0x4C2BE7F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd
==13768==    by 0x109C2F: main (in /home/vonaka/VirtualMachine/machine)
==13768== 

So (luckily for us) it's not a Windows specific problem. The trick is that on the first call of getAddress (when m->symbolTable is NULL) you call getFirstFree(*m) at the beginning of memallocate, but look at this function:

static uint16_t getFirstFree(Machine m) {
    uint16_t add = 0;
    while(m.memory[add].acquired)
        add++;
    return add;
}

m.memory[i].acquired for i between 0 and number_of_instructions_in_your_input_file - 1are all equal to 1 as you initialize them in writeInstruction, but m.memory[number_of_instructions_in_your_input_file].acquired is not initialized yet.

So something like this will resolve your problem:

void writeInstruction(Machine *m, uint16_t add, Instruction ins) {
    m->memory[add].acquired = 1;
    m->memory[add].type = INSTRUCTION;
    m->memory[add].data.instruction = ins;
    m->occupiedAddress++;
    if(add + 1 < NUM_MEM)
        m->memory[add + 1].acquired = 0;
}

Or maybe this is more elegant (if it's works):

static uint16_t getFirstFree(Machine m) {
    uint16_t add = 0;
    while (m.memory[add].acquired && add < m.occupiedAddress)
        add++;
    return add;
}

Edit:

First of all about your comment:

By default, the members of the structure is initialised as 0

It's just not true!

Now about why you have segfault without malloc and how it's connected with valgrind's warning.

You have variable m of type Machine and some other variables in the stack, m contains Cell memory[NUM_MEM] and there is acquired in each Cell (which are not initialized!). Your input file contains let's say 88 instructions, so first 88 acquired will be correctly initialized after 88 calls of writeInstruction. Then program start to execute your instructions by calling some functions including memallocate and getFirstFree. In this loop:

while(m.memory[add].acquired)
    add++;

for any add m.memory[add].acquired very likely can be different from 0, so once add is equal to NUM_MEM you have segfault.

Why it's not happening with malloc? Simply because you are lucky (but it's not a good luck), your heap is 'cleaner' than stack. Why it's happening only in Windows? Because this time you were not so lucky (I don't have segfault even in Windows).

vonaka
  • 913
  • 1
  • 11
  • 23
  • First of all, thank you very much for looking so thoroughly through the code. Really appreciate it. Now here's the thing : I told a proper "segfault" as the error, not just the warnings in valgrind. They do not harm the execution of the program and as a matter of fact, I'm well aware of them. Run it in Windows, you'll know what I'm talking about. Secondly, acquired is the flag to denote whether a particular `Cell` is occupied or not, and only when a write in that cell occurs, either data or instruction write, it's value is changed to 1 – Subhranil Aug 19 '17 at 10:50
  • By default, the members of the structure is initialised as 0, and that's why I didn't initialise them explicitly. It's a bad hack, and I'll turn around from it soon. Now the actual problem was arising from the fact that the structure `Machine` was (strangely) too large to keep in stack along with 4 or 5 level of function calls, which corrupted the stack. Now when I was calling `memallocate`, for insufficient space, its address was getting messed up, and hence the segfault. To resolve this, all I did was performing a `malloc` for the `Machine` struct in `main()` instead of declaring it locally – Subhranil Aug 19 '17 at 10:55
  • And it solved the problem. Again, this only happened in Windows. – Subhranil Aug 19 '17 at 11:00
  • @Subhranil, I think you don't get it. I edited my answer. – vonaka Aug 19 '17 at 12:53
  • I absolutely get what you're saying. All I'm saying is I ran the program through the visual debugger in CLion, and saw exactly what was happening : The pointers passed to `memallocate` was getting messed up, but they were perfectly fine till `getAddress`. Since they are the same pointers, they shouldn't change. Moreover, not a single line inside `memallocate` was executing, the segfault was happening on the call itself. You didn't get segfault because you've cloned the repo after I pushed the commits yesterday. I'm pretty sure if you declare `Machine m` locally in `main()`, and pass `&m` – Subhranil Aug 19 '17 at 14:43
  • to the subsequent functions, you'll get it, the segfault will happen again. – Subhranil Aug 19 '17 at 14:44
  • And just the case of `out of bounds access`, valgrind reports those too. `trying to access uninitialized value` is not merely close to that. – Subhranil Aug 19 '17 at 14:45
  • Do try the `Machine m` thing if you got time, I think you'll know. And you can check the commits to find out if you indeed cloned after that specified commit. – Subhranil Aug 19 '17 at 14:47
  • I took your code from d87ee7f to test it on Windows at the moment of writing edit, but it's not important, you have an error in your code and you do not want to fix it, I don't know why, but ok. Of course it's not important if you allocate memory in heap or in stack, so `malloc` can't be a solution. I see you made a proper initialization of `Machine` structure, it's already good, but I suggest you to suppose that I'm not as dumb as you think right now, and imagine what may happen with the program if there are `NUM_MEM` instructions in your input file. – vonaka Aug 19 '17 at 15:59
  • don't take it otherwise. I know I have such errors and I know how to fix them. I also had this strange Windows only segfault error which I didn't know how to fix, hence I created this thread. If you pick up from d87ee7f, you sure will see the segfault. Here's a couple of things : a) `malloc` is a solution for this case, because it does matter if the stack is overflown so that no new addresses can be pushed in it. b) Inside the initializer, you'll see now I initialized all `NUM_MEM` cells properly now and c) I think you're a very good person who helps people with all his might. – Subhranil Aug 19 '17 at 17:15
  • @Subhranil, sorry, but I don't agree with (a) (maybe there's a reason for using malloc but not as solution of this segfault), thanks a lot for (c) and good luck with your work. In any case I like very much your intention to truly understand your problem by yourself. – vonaka Aug 19 '17 at 17:46
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/152344/discussion-between-subhranil-and-vonaka). – Subhranil Aug 19 '17 at 18:17