1

I was reading a book where I came across this line:

"The SPARQL FROM clause provide another way to define custom union graphs. The FROM clause is used to identify the default graph for a query. The most typical use is to identify a single RDF graph. However if multiple FROM clauses are specified in a query then the contents of those graphs are merged (typically in-memory) to provide a union graph that will form the default graph for the query. This feature of SPARQL can therefore provide another way to assemble a useful graph-agnostic view of a dataset."

Here it says "those graphs are merged (typically in-memory) to provide a union graph".

I am new to Apache Jena, so this got me thinking are such big GRAPH unions happen in-memory ?

So I use TDB to store my graphs and I am querying them using SPARQL and I want to query the "GRAPH union of 2 particular graphs given in multiple FROM clauses" or "GRAPH union of all named graphs":

Will these UNION happens in-memory from my Java code where I use ARQ to query TDB ??

Will this not cause OutOfMemory error lot of times since Graphs can be many ?

This might seem rookie question, pardon my beginner experience in Jena.

Stanislav Kralin
  • 11,070
  • 4
  • 35
  • 58
Siddharth Trikha
  • 2,648
  • 8
  • 57
  • 101
  • I can't speak for Apache Jena specifically, but generally speaking that is just not true. I'm not immediately aware of any SPARQL engine or database system that computes the union of multiple FROM clauses in memory (unless you count an actual in-memory database, of course). There may be some instances of this that I'm not aware of, but it's quite definitely not the "typical" case. – Jeen Broekstra Apr 15 '20 at 13:05
  • 1
    It is not in-memory in Apache Jena. Each access to the union of graphs is made to look like it is one graph (no duplicates). In the worst case, this may take some memory - but it is only proportional to the triples accessed, not the whole graph. – AndyS Apr 15 '20 at 22:02

1 Answers1

1

I can of course only guess the authors' intent here, but it's possible that they only meant to say that the processing of multiple FROM clauses can happen by retrieving data from each named graph and then as part of the query processing producing the union merge of those as the query result. Note that this doesn't imply that the entire named graphs are kept in memory, merely that as the query executes and iterates over individual results (in memory), it combines results from both source into a "unionized" result.

In any case: it's highly unlikely that any serious SPARQL database (including Jena) processes queries with multiple FROM clauses by loading the entire dataset into memory first.

Jeen Broekstra
  • 21,642
  • 4
  • 51
  • 73
  • To quote it again _"graphs are merged (typically in-memory) to provide a __union graph that will form the default graph for the query.__"_. So union graph forms the default graph _for_ query. So reading this kinda points author is not referring to individual query results. However, generally it doesn't make sense to bring the named graph into memory. – Siddharth Trikha Apr 15 '20 at 17:21
  • 1
    If the graphs are read in from remote URLs, then the graph will likely be in-memory - there is no local storage database. When there is the union of graphs from a local database, there really is no need to materialize a merged graph. All that mattersis does access look like - which is suppress duplicates. – AndyS Apr 15 '20 at 22:08
  • @AndyS: Are you saying that for local storage Graphs will not be in-memory and for remote storage they will be in-memory ?? For [example](https://github.com/apache/jena/blob/master/jena-rdfconnection/src/main/java/org/apache/jena/rdfconnection/examples/RDFConnectionExample4.java) If I am connecting to a Fuseki server and I use ARQ to execute my query, this will offcourse run on Fuseki server with hardly in-memory consumption in my application ? – Siddharth Trikha Apr 16 '20 at 09:52
  • Yes. The TDB query engine will perform union on access, not on the graphs themselves. – AndyS Apr 16 '20 at 09:55
  • @AndyS: Sorry didn't get it. Perform on access ? Union will happen on Fuseki server ? – Siddharth Trikha Apr 16 '20 at 10:09
  • Suppose there are two graphs, each of which has the triple `:s :p :o`. When accessing the union of the two graphs it must appear as if there is one triple because an RDF graph is a *set* of triples. That can be done precomputing the union, or by filtering triples when the app/SPARQL reads the graph. Anyway so that the illusion of a set of triples is maintained. Filtering when reading needs to only process triples actually read. – AndyS Apr 16 '20 at 13:30
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/211810/discussion-between-siddharth-trikha-and-andys). – Siddharth Trikha Apr 16 '20 at 14:15