1

I have a big structure with items and relations between the items. I need to find all transitive relations for all items. I duplicate all links and use transitive closure. E.g.:

A --- B --- C     E --- F --- G
      |
      |
      D

As a result I need to get the pairs:

A-B, A-C, A-D, B-A, B-C, B-D, C-A, C-B, C-D, D-A, D-B, D-C,
E-F, E-G, F-E, F-G, G-E, G-F

For using transitive closure I should use pairs [A-B, B-A, B-C, C-B, B-D, D-B, E-F, F-E, F-G, G-F]. It's big problem for me because the dataset is very large. The best way to solve my problem would be an algorithm, that allows get all relations using only one-side links (A-B, B-C, C-D, E-F, F-G). Are there any algorithms to get all relations for each element of the graph without duplicate links?

Juan Carlos Ramirez
  • 2,054
  • 1
  • 7
  • 22
Egor
  • 1,334
  • 8
  • 22
  • 3
    Apparently you're interested in grouping items that are mutually reachable, in other words: you're looking for the connected components of your graph, whose vertices can be grouped using sets, which should not be too expensive for you. If you frequently need to look up which component contains A, B, ..., you should probably use a [disjoint set data structure](https://en.wikipedia.org/wiki/Disjoint-set_data_structure). – Anthony Labarre Mar 12 '19 at 21:13
  • As the previous statement implies, use each of your given relations to find the connected components. Then from each component take all combinations of 2 items taken from that component. The "combinations" means that you get only one-direction: `AB` but not `BA`. Can you do both of those tasks? If not, do a web search linked to your favorite computer language. – Rory Daulton Mar 12 '19 at 22:21

1 Answers1

2

You may model this problem as a graph problem, and traverse the entire dataset you have, using either DFS(depth-first search) or BFS(breadth-first-search). During traversal, you may assign a component number to each tree in the forest of data you are investigating, and as a result, you can find all the connected components of this graph of data you have. Then for each connected compnent, you may simply form groups of 2 using its members, and use those to describe the relation. If there are odd number of elements, you can pick an already use item and link it to the last remaining one.

This obviously assumes that your goal is to find the connected components alone, and not print the relations, as you put it, in a specific manner. For instance, if you were trying to print the links so that the maximum distance between the items would be as minimal as possible, the problem becomes much more complex.

Another approach which shares the same assumption I mentioned above would be to use the method of union-find, also known as the data structure called disjoint set. You can start with N sets which have N of your items. Then, as you traverse these relations, for each relation (x, y), you unite the sets which contain the items x and y. In the end, all the connected components will be in the same set.

The first approach has O(V + E) time complexity, V and E being the number of items and relations in your data, respectively. The second approach has O(V + E . k(V)) time complexity, where k is a function called Inverse Ackermann, that increases really slowly. (i.e. even slower than logarithmic function)

ilim
  • 4,477
  • 7
  • 27
  • 46