-1

I repeat this function 2500 times in a loop for different parameters and it takes 85 seconds. What is wrong with this function? How can I improve running time? Thanks for your help.

Function:

int findBaconNumber(v * actors[], int actorCount, int index, v * visited[]){

    // Bacon number is 0:
    if(strcmp(actors[index]->name, "Bacon,Kevin") == 0)
        return 0;

    // Bacon number is infinite:
    else if(actors[index]->next == NULL)
        return -1;

    // Otherwise, calculate:
    memset(visited, NULL, sizeof(visited));
    q * queue = createQueue();
    v * tmp = actors[index];
    v * found;
    n * tmp2;
    int baconNumber = 0;
    int visitCount = 0;
    int empty = 0;
    int full = 0;

    // Add first item to queue:
    enqueue(queue, tmp);
    visited[visitCount] = tmp;
    visitCount++;
    tmp->parent = NULL;


    // Until queue is empty, queue is full or kevin bacon found:
    while(strcmp(tmp->name, "Bacon, Kevin") != 0 && isEmpty(queue) != 1 && isFull(queue) != 1){
        // Get neighbors:
        if(tmp->next != NULL)
            tmp2 = tmp->next;
        else
            tmp2 = NULL;

        // Add neighbors to queue:
        while(tmp2 != NULL){
            if(contains(visited,visitCount+1,actors[tmp2->actorNo]) != 1){
                enqueue(queue, actors[tmp2->actorNo]);
                visited[visitCount] = actors[tmp2->actorNo];
                visitCount++;
                actors[tmp2->actorNo]->parent = tmp;
            }
            tmp2 = tmp2->next;
        }

        // Dequeue current item and skip to next item:
        dequeue(queue);

        // Get next item in queue:
        if(getFront(queue) != NULL)
            tmp = getFront(queue);

    }

    if(strcmp(tmp->name, "Bacon, Kevin") != 0)
        return -1;

    while(tmp->parent != NULL && tmp != NULL){
        baconNumber++;
        tmp = tmp->parent;
    }
    return baconNumber;
}

For Loop:

for(i=0; i<actorCount; i++){
    baconNumbers[i] = findBaconNumber(actors, actorCount, i, visited);
}

EDIT: Thanks all of you for answers. This is my school project and it must be solved by breadth first search algorithm so I can not use other solutions. I profiled the code it looks like the problem is with the contains function. I use it to check if node is visited or not. Now, I am working on finding another solution for this.

img1

img2

  • 2
    Use a profiler to check where it's spending most time. It could be in the queue. – klutt May 12 '20 at 17:01
  • You have a `for` loop calling a function that has a `while` loop and, inside that, there are two other `while` loops, first of which is more complex than the second. These are probably the issue. Running a profiler is the way to go. Intuitively, you'll probably need to start with the first inmost `while` loop and examine it for optimization (avoid doing things more than once where possible, etc). – lurker May 12 '20 at 17:05
  • While you're at it. Break out all loops into functions. That will make the profiler give you much more useful information. – klutt May 12 '20 at 17:09
  • I don't know much about profiling. Which profiler do you suggest? Is Visual Studio good? – Barış Yerlikaya May 12 '20 at 17:11
  • Any will do. This is not very complicated code. – klutt May 12 '20 at 17:12
  • Related, if not a dup: https://stackoverflow.com/q/675087/6699433 – klutt May 12 '20 at 17:15
  • 1
    You probably should sanity-check that you visit each actor only once. I don't understand `visited`. Your use seems backward to me. Shouldn't it index by `actorNo` and store `visitCount` as a value? Otherwise how do prevent visiting actors you've already visited? – jamesdlin May 12 '20 at 17:32
  • For every actor, you try to find the distance to Kevin Bacon. Do it the other way round and find all Bacon numbers in one go: Identify Kevin Bacon. Initialize the Bacon number of each actor to a large number. Now start a DFS from Kevin Bacon and keep track of the distance. If you visit a node with a higher number than the current one, replace it and keep recursing. If not, don't visit any more neighbours (but keep recusring to actors still in the queue.). Eventually, you'll have all Bacon numbers. – M Oehm May 12 '20 at 17:58
  • (Also, avoid string comparison. Identify Kevin Bacon once, then just test `tmp == bacon`. This also removes the typo in your code: Does Kevin Bacon's name have a space after the comma or not?) – M Oehm May 12 '20 at 18:00
  • I think you should post all/more of your code. If `actors` is a list of actors, `actorNo` should just be the index into `actors`. `visited` should be a [byte] vector of size `actorCount` and `contains` should just be `visited[actorIndex] != 0` You might just need a `visited` field in the `v` struct instead. And you might be able to just have embedded `prev/next` pointers within `v` itself, rather than a separate `queue`. – Craig Estey May 12 '20 at 18:25
  • Thanks a lot for your answers. I profiled it and added pictures above. – Barış Yerlikaya May 12 '20 at 19:09

2 Answers2

1

Some things that gets to messy to write in comments:

1) Use a profiler to find out what parts of the code that is actually taking time. It can be wise to break out all loops into functions to get more useful information from the profiler.

2) Use the information from the profiler to optimize the code

You're not showing the implementation of queue. That could be the bottleneck. If it is, there are basically two ways to attack it. Either try to optimize the queue code, or change the overall algorithm to use the queue less or in a smarter way.

Solving the Bacon number is basically a shortest path problem, so you might want to tinker with different algorithms. Dijkstras and A* are common shortest path algorithms.

Another thing you might look into is preprocessing the data. Right now, you have a struct array, and the structs have a field called name. Maybe you could change this to an int where 1 indicates that the name is "Bacon,Kevin" and 0 that it's not. That would save you potentially costly calls to strcmp and might also make it more cache friendly.

You might want to change the representation completely. Right now, I get the impression that you have something like this:

struct actor {
    struct actor *parent, *next;
    char *name;
}

I already mentioned that you could change name to a simple integer, but perhaps you could also change everything to:

int *names;
int *parents;
int *next;

That could make the code a lot more cache friendly. Or not. But it is worth trying. Linked lists are usually not very cache friendly. And this can be good to think about when it comes to the queue too.

Also, you might have a look at this: Calculating "Kevin Bacon" Numbers

klutt
  • 30,332
  • 17
  • 55
  • 95
0

I solved this problem with your helpful advices.

The way I used before (Too slow => takes 85 seconds for my input): -When a node is visited, append this node to the end of visited[] array. -Checking if node is visited or not by looping over all elements of visited[] array (via contains() function)

Faster way which I am using now (1.75 seconds for the same input): -When a node is visited, assign this node to related index => (visited[actorNumber] = node;) -Checking if node is visited or not without looping over all elements, only checking the related index.

I also removed strcmp and I am checking by index now.

Updated code is here:

int findBaconNumber(v * actors[], int actorCount, int index, int baconsIndex){

// Bacon number is 0:
if(actors[index]->index == baconsIndex)
    return 0;

// Bacon number is infinite:
else if(actors[index]->next == NULL)
    return -1;

// Otherwise, calculate:
q * queue = createQueue();
v * visited[ACTORS_SIZE] = {};
v * tmp = actors[index];
v * found;
n * tmp2;
int baconNumber = 0;
int empty = 0;
int full = 0;

// Add first item to queue:
enqueue(queue, tmp);
visited[tmp->index] = tmp;
tmp->parent = NULL;

// Until queue is empty, queue is full or kevin bacon found:
while(tmp->index != baconsIndex && isEmpty(queue) != 1 && isFull(queue) != 1){

    // Get neighbors:
    if(tmp->next != NULL)
        tmp2 = tmp->next;
    else
        tmp2 = NULL;

    // Add neighbors to queue:
    while(tmp2 != NULL){
        // If not visited:
        if(visited[tmp2->actorNo] == NULL){
            enqueue(queue, actors[tmp2->actorNo]);
            visited[tmp2->actorNo] = actors[tmp2->actorNo];
            actors[tmp2->actorNo]->parent = tmp;
        }
        tmp2 = tmp2->next;
    }

    // Dequeue current item and skip to next item:
    dequeue(queue);

    // Get next item in queue:
    if(getFront(queue) != NULL)
        tmp = getFront(queue);
}

if(tmp->index != baconsIndex)
    return -1;

while(tmp->parent != NULL && tmp != NULL){
    baconNumber++;
    tmp = tmp->parent;
}   
return baconNumber;

}