1

I am using adjacency list representation.

basically

     A:[B,C,D] means A is connected to B,C and D

now I am trying to add a method (in python) to add edge in graph.

But before I add an edge. I want to check whether two edges are connected or not. So for example I want to add an edge between two nodes D and A ( ignorant of the fact taht A and D are connected).

So, since there is no key "D" in the hash/dictionary, it will return false.

Now, very naively, I can check for D and A and then A and D as well.. but thats very scruff. Or whenever I connect two nodes, I can always duplicate..

I.e when connecting A and E.. A:[E] create E:[A]

but this is not very space efficient.

Basically I want to make this graph direction independent.

Is there any data structure that can help me solve this.

I am hoping that my question makes sense.

frazman
  • 32,081
  • 75
  • 184
  • 269
  • 1
    erm what's wrong with return `A.contains(D) or D.contains(A)` ? (with null safety of course)? Is it too slow? Please elaborate what's the problem with the straight forward approach – amit Oct 09 '12 at 17:59

4 Answers4

3

For an undirected graph you could use a simple edge list in which you store all the pairs of edges. This will save space and worsen performance but you should know that you can't have both at the same time so you always have to decide for a tradeoff.

Otherwise you could use a triangular adjacency matrix but, to avoid wasting half of the space, you will have to store it in a particular way (by developing an efficient way to retrieve edge existence without wasting space). Are you sure it is worth it and it's not just premature optimization?

Adjacency lists are mostly fine, even if you have to store every undirected edge twice, how big is your graph?

Take a look at this my answer: Graph representation benchmarking, so you can choose which one you prefer.

Community
  • 1
  • 1
Jack
  • 131,802
  • 30
  • 241
  • 343
1

You have run into a classic space vs. time tradeoff.

As you said, if you don't find D->A, you can search for A->D. This will result in maximum of double your execution time. Alternatively, when inserting A->D, also create D->A, but this comes at the cost of additional space.

Worst case, for the time tradeoff, you will do 2 lookups, which is still O(N) (faster with better data structures). For the space tradeoff, you will (in the worst case) create a link between every set of nodes, which is roughly O(N^2). As such, I would just do 2 lookups.

samoz
  • 56,849
  • 55
  • 141
  • 195
1

Assuming each contains() method is VERY expansive, and you want to avoid doing these in all costs, one can use a bloom filter, and check if an edge exists - and by this, reduce the number of contains() calls.

The idea is: Each node will hold its own bloom filter, which will indicate which edges are connected to it. Checking a bloom filter is fairly easy and cheap, and also modifying it when an edge is added.

If you checked the bloom filter - and it said "no" - you can safely add the edge - it does not exist.
However, bloom filters have False Positives - so, if the bloom filter said "the edge exists" - you will have to check the list if it is indeed there.


Notes:
(1) Removing edges will be a problem if using bloom filters.
(2) Bloom filters give you nice time/space trade off - as the number of false positives decrease as the size of the filter grows.
(3) However, when an edge does exist - no matter what's the size of the filter, you will alwats have to use the contains() method.

amit
  • 175,853
  • 27
  • 231
  • 333
0

Assuming your node names can be compared, you can simply always store the edges so that the first endpoint is less than the second endpoint. Then you only have one lookup to perform. This definitely works for strings.

rici
  • 234,347
  • 28
  • 237
  • 341