0

Imagine the following BIG-DATA situation:

There are 1 million persons stored in a SQL database. Each of them follows exactly 50 other persons.

So there is a table like this (with 50 million entries):

person1 | person2
0       |       1
0       |   2.341
0       | 212.881
..      |      ..
999.999 | 421.111
999.999 | 891.129
999.999 | 920.917

Is it possible to use Oracle's connect by or MySQL's WITH RECURSIVE to find out if there is a connection (maybe over intermediaries) from one person to an other?

Would those queries literally run forever? (the data are highly connected)

Or is there a way to limit the depth of the queries? (in this case: only < 3 intermediaries)


context: this example will be used to explain why a graph database can be better in some cases and I want to show whether this is even solvable with SQL.

hardfork
  • 2,470
  • 1
  • 23
  • 43
  • In programming lauguages like C++ or PHP recursive calls normally are bad because they stack and use alot of memory on a very large amount of calls.. Not sure tho how `CONNECT BY` and `WITH RECURSIVE` are implemented in the SQL engines.. I assume both statements can use indexing but you should check it with query planner statement like EXPLAIN for MySQL. – Raymond Nijland Oct 30 '18 at 11:30
  • Given that your intent is to prove the superiority of graph databases to relational databases for this example, I suggest you construct your databases and queries in such a way that you obtain the results you're looking for. Other than that, all you're going to get here is opinions. – Bob Jarvis - Слава Україні Oct 30 '18 at 11:55
  • The basic problem is solvable with SQL. But it isn't going to scale to millions of nodes with tens of millions of edges. An RDBMS like Oracle stores its data in a fundamentally different way to a graph database like say Neo4J. That's why Oracle has an extension in the [Spatial and Graph option](https://docs.oracle.com/database/121/SPATL/toc.htm), which is not only a chargeable extra to the Enterprise Edition but also requires Partitioning, another chargeable extra. So rather more expensive than Neo4J :) – APC Oct 30 '18 at 12:57
  • Is the question one of _existance_, or finding the _links_? There could be a big difference. – Rick James Oct 30 '18 at 22:54
  • Both. The task is to find find a link if it exsists. – hardfork Oct 31 '18 at 12:14

1 Answers1

1

Is it possible to use Oracle's connect by or MySQL's WITH RECURSIVE to find out if there is a connection (maybe over intermediaries) from one person to an other?

Yes. That's the purpose of those features.

Would those queries literally run forever? (the data are highly connected)

As with all SQL queries, appropriate indexes are vital for good performance.

As for "forever" Oracle detects loops in hierarchies (that is, when the data breaks the assumption that it is a directed acyclic graph.)

Recursive common table expressions (in most non-Oracle table servers) can have their recursion limited by level. See this https://dba.stackexchange.com/questions/16111/cte-running-in-infinite-loop.

Is it better to do this kind of work with a graph database? That's a matter of opinion.

  • You still need loop detection.
  • In production, moving data from one database to another, or keeping copies in multiple places, is costly. So your pragmatic design choice will be influenced by where your system stores the data.
O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • "You still need loop detection." -> If I'm right, algorithms like graph algorithms like breadth first search only attend nodes that have not been visited by the algorithm so far. So there wouldn't be a loop-problem in this case. – hardfork Dec 04 '18 at 09:00
  • *graph algorithms like breadth first search only attend nodes that have not been visited by the algorithm so far.* === loop detection. – O. Jones Dec 04 '18 at 16:57