Trying to pick the right data-structure

Question

So I want to store all the actors from a production in a graph. So from the input we will read like this:

Let N be the number of movies we read, for each movie we read: the name of the movie, on the next line would be the number of the actors (let that be nr) now on the next nr lines would be the name of each actor. In a movie every actor is "connected" with every other actor.

What I've done so far:

I built an binary search tree based on the actor's name, and in every "node" of this binary search tree I have stored: the id of actor, all the movies he played in and his/her name.

So it would be like this:

typedef struct binaryTree
{
    int size;
    int id;
    int movieSize;
    int movieCapacity;
    char **movieName;
    char *actorName;
    struct binaryTree *left;
    struct binaryTree *right;
} *BinaryTree;

Now using this binary search tree, I want to search for every movie, and for every actor in that movie I want to connect theirs ids togheter in the graph.

Now here is the problem: I made the BST, but now while searching for every movies name, I will have to search for it in every actor, so that would still be O(nr) for every search, so that didn't really help me. What would be another way of thinking about this to make the initialisation of the graph easier and efficient?

EDIT:

For a better visualisation of the data:

Movie1:      Movie2:       Movie3:    Movie5:
Actor1       Actor1        Actor2     Actor6
Actor5       Actor3        Actor4     Actor7
Actor3       Actor4        Actor5     Actor8

The actual input that will be read:

4
Movie1
3
Actor1
Actor5
Actor3
Movie2
3
Actor1
Actor3
Actor4
Movie3
3
Actor2
Actor4
Actor5
Movie4
3
Actor6
Actor7
Actor8

This would be read in the order of : Movie1 with Actor1, Actor5, Actor3, Movie2 with Actor1, Actor3, Actor4, etc. (as you can see when you read Actor1 the second time it won't be added a new node but it will add a new movie to Actor1's list of movies. Also here is a photo of the binary tree based on this data:

I want my data structure to support fast access between id and the name of the actor and also fast sorting of the data. That is the reason why I thought a BST would be appropiate and from there to build the graph with the IDs.The question is how do I really proceed after making this binary search tree on every actor's name with build up the graph or is there another (better) way of storing/accomplishing this?

Here is a photo of the graph I want to build where every number of the node is the number that is inside the actor's name ( for simplification purposes):

You haven't really stated the problem. To design a data structure, you need to know both the data to be stored and _the operations on the data_ that must be supported. You've said the former, but not the latter. If the goal is to just build a graph where there's an edge for every pair of actors for every movie they've both appeared in, then you don't need a BST. Moreover, this will be an undirected multigraph: each pair of nodes can have any number of edges. What are you really trying to accomplish? — Gene, Apr 27 '19 at 20:14
Not trying to be pedantic, but you're still not asking the right questions. There are many alternative ways to represent a graph. Which one to pick is all about the operations you plan to perform on it. Building the graph once you've decided is then trivial: iterate through all pairs of actors in the same movie and add an edge for each. Iterating through pairs is just 2 nested loops. So I ask again, what are you trying to accomplish (beyond "build a graph")? — Gene, May 01 '19 at 15:14
@Gene Perform operations on it, bfs, dfs, find the cut vertices and then sorting the output alphabetically, and finding the maximum clique — J. Homer, May 01 '19 at 15:21

score 0 · Answer 1 · answered Apr 27 '19 at 22:15

You might want to represent the graph of actors with the Adjacency List (see https://www.geeksforgeeks.org/graph-and-its-representations/).

For each Actor A, you will create an array of Actors which A is connected with.

struct Actor {
  int   id;
  char  *name;
}

struct AdjacencyRow {
  struct Actor  *actor;
  unsigned      connected_actors_count;
  struct Actor  **connected_actors;
}

struct AdjacencyList {
  unsigned             row_count;
  struct AdjacencyRow  *rows;
}

During the input parsing, whenever you encounter a new Actor, you allocate a new struct Actor for him and allocate a new struct AdjacencyRow in the struct AdjacencyList for him.

Then in this new Actor's row, you add pointers to the other Actors playing in the same Movie (you'll need to remember their ids for currently processed Movie), and you add pointers to this Actor to all the other Actors' rows.

When you encounter already known Actor, you'll just add the pointers (watch for duplicates though).

This gives you O(n log n) complexity (for n=number of actors per movie), as k operations must be performed for k-th Actor being added.

This way you end up with a structure which lets you find directly connected actors in O(e) time, actors who are both connected to the same actor in O(e^2) time etc. (where e=average number of connections per actor).

The above example can be simplified by getting rid of id completely and just building an array of names and array of arrays of int (as AdjacencyList).

Hello! I know what an adjaceny list is, also why is this better than the BST idea? — J. Homer, Apr 28 '19 at 05:55
Also "whenever you encounter a new actor", thus I have to search every time in my adjacenylist if the Actor was already allocated — J. Homer, Apr 28 '19 at 07:46

Trying to pick the right data-structure

1 Answers1