6

I have a function shortestPath() that is a modified implementation of Dijkstra's algorithm for use with a board game AI I am working on for my comp2 class. I have trawled through the website and using gdb and valgrind I know exactly where the segfault happens (actually knew that a few hours ago), but can't figure out what undefined behaviour or logic error is causing the problem.

The function in which the problem occurs is called around 10x and works as expected until it segfaults with GDB: "error reading variable: cannot access memory" and valgrind: "Invalid read of size 8"

Normally that would be enough, but I can't work this one out. Also any general advise and tips are appreciated... thanks!

GDB: https://gist.github.com/mckayryan/b8d1e9cdcc58dd1627ea
Valgrind: https://gist.github.com/mckayryan/8495963f6e62a51a734f

Here is the function in which the segfault occurs:

static void processBuffer (GameView currentView, Link pQ, int *pQLen, 
                           LocationID *buffer, int bufferLen, Link prev,
                           LocationID cur)
{
    //printLinkIndex("prev", prev, NUM_MAP_LOCATIONS);
    // adds newly retrieved buffer Locations to queue adding link types 
    appendLocationsToQueue(currentView, pQ, pQLen, buffer, bufferLen, cur);
    // calculates distance of new locations and updates prev when needed
    updatePrev(currentView, pQ, pQLen, prev, cur);  <--- this line here 

    qsort((void *) pQ, *pQLen, sizeof(link), (compfn)cmpDist);
    // qsort sanity check
    int i, qsortErr = 0;
    for (i = 0; i < *pQLen-1; i++) 
        if (pQ[i].dist > pQ[i+1].dist) qsortErr = 1;
    if (qsortErr) {
        fprintf(stderr, "loadToPQ: qsort did not sort succesfully");
        abort();
    }  
}

and the function whereby after it is called everything falls apart:

static void appendLocationsToQueue (GameView currentView, Link pQ, 
                                   int *pQLen, LocationID *buffer, 
                                   int bufferLen, LocationID cur)
{
    int i, c, conns;
    TransportID type[MAX_TRANSPORT] = { NONE };     

    for (i = 0; i < bufferLen; i++) { 
        // get connection information (up to 3 possible)  
        conns = connections(currentView->gameMap, cur, buffer[i], type);
        for (c = 0; c < conns; c++) {
            pQ[*pQLen].loc = buffer[i];
            pQ[(*pQLen)++].type = type[c];            
        }            
    }
}

So I thought that a pointer had been overridden to the wrong address, but after a lot of printing in GDB that doesn't seem to be the case. I also rotated through making reads/writes to the variables in question to see which trigger the fault and they all do after appendLocationsToQueue(), but not before (or at the end of that function for that matter).

Here is the rest of the relevant code: shortestPath():

Link shortestPath (GameView currentView, LocationID from, LocationID to, PlayerID player, int road, int rail, int boat)
{
    if (!RAIL_MOVE) rail = 0;

    // index of locations that have been visited    
    int visited[NUM_MAP_LOCATIONS] = { 0 };

    // current shortest distance from the source
    // the previous node for current known shortest path
    Link prev;
    if(!(prev = malloc(NUM_MAP_LOCATIONS*sizeof(link))))
        fprintf(stderr, "GameView.c: shortestPath: malloc failure (prev)");

    int i;
    // intialise link data structure
    for (i = 0; i < NUM_MAP_LOCATIONS; i++) {
        prev[i].loc = NOWHERE;
        prev[i].type = NONE;
        if (i != from) prev[i].dist = INF; 
        else prev[i].dist = LAST; 
    }
    LocationID *buffer, cur;
    // a priority queue that dictates the order LocationID's are checked
    Link pQ;
    int bufferLen, pQLen = 0;
    if (!(pQ = malloc(MAX_QUEUE*sizeof(link))))
        fprintf(stderr, "GameView.c: shortestPath: malloc failure (pQ)");
    // load initial location into queue
    pQ[pQLen++].loc = from;

    while (!visited[to]) {
        // remove first item from queue into cur  
        shift(pQ, &pQLen, &cur);
        if (visited[cur]) continue;
        // freeing malloc from connectedLocations()
        if (cur != from) free(buffer); 
        // find all locations connected to   
        buffer = connectedLocations(currentView, &bufferLen, cur, 
                                    player, currentView->roundNum, road, 
                                    rail, boat); 
        // mark current node as visited
        visited[cur] = VISITED;
        // locations from buffer are used to update priority queue (pQ) 
        // and distance information in prev       
        processBuffer(currentView, pQ, &pQLen, buffer, bufferLen, prev,
                      cur);
    }
    free(buffer);
    free(pQ);
    return prev;
}
Ryan Mckay
  • 61
  • 1
  • 1
  • 3

1 Answers1

9

The fact that all your parameters look good before this line:

appendLocationsToQueue(currentView, pQ, pQLen, buffer, bufferLen, cur);

and become unavailable after it tells me that you've stepped on (wrote 0x7fff00000000 to) the $rbp register (all local variables and parameters are relative to $rbp when building without optimization).

You can confirm this in GDB with print $rbp before and after call to appendLocationsToQueue ($rbp is supposed to always have the same value inside a given function, but will have changed).

Assuming this is true, there are only a few ways this could happen, and the most likely way is a stack buffer overflow in appendLocationsToQueue (or something it calls).

You should be able to use Address Sanitizer (g++ -fsanitize=address ...) to find this bug fairly easily.

It's also fairly easy to find the overflow in GDB: step into appendLocationsToQueue, and do watch -l *(char**)$rbp, continue. The watchpoint should fire when your code overwrites the $rbp save location.

Employed Russian
  • 199,314
  • 34
  • 295
  • 362
  • That worked a treat. I was operating under the assumption that valgrind would alert me to the type of overflow that was causing the problem and therefore attributed it to something unseen. – Ryan Mckay Oct 15 '15 at 00:59
  • 1
    @RyanMckay Valgrind performs very little checking of stack and globals. Address Sanitizer is *much* better for that. – Employed Russian Oct 15 '15 at 01:04