0

I am trying to build a subgraph of freebase based on a given topic entity, as the time taken to query the full freebase is too time consuming.

My first attempt at building a 3-hop subgraph was as follows:

PREFIX ns:<http://rdf.freebase.com/ns/>
SELECT ?r0, ?e1, ?r1, ?e2, ?r2, ?e3 WHERE 
{
    ns:m.034rd ?r0 ?e1.
    ?e1 ?r1 ?e2.
    ?e2 ?r2 ?e3.
}

This does not work, as it ignores all paths which are max 1- and 2- Hops away from the topic entity.

The Next attempt I made was as follows:

    PREFIX ns:<http://rdf.freebase.com/ns/>
    SELECT ?r0, ?e1, ?r1, ?e2, ?r2, ?e3 WHERE 
    {
        ns:m.034rd ?r0 ?e1.
        OPTIONAL{
            ?e1 ?r1 ?e2.
        }
        OPTIONAL{
        ?e2 ?r2 ?e3.
        }
    }

This did not work either, although I admittedly don't know why or if I am even using the OPTIONAL tag correctly.

Following my failure to build a single SPARQL query, I tried to iteratively query freebase, and build the graph as such. I have tried two things:

(1):

PREFIX ns:<http://rdf.freebase.com/ns/>
SELECT ?r0, ?e1 WHERE 
{
    ns:m.034rd ?r0 ?e1.
}

and

PREFIX ns:<http://rdf.freebase.com/ns/>
SELECT ?r0, ?e1, ?r1, ?e2 WHERE 
{
    ns:m.034rd ?r0 ?e1.
    ?e1 ?r1 ?e2.
}

and

PREFIX ns:<http://rdf.freebase.com/ns/>
SELECT ?r0, ?e1, ?r1, ?e2, ?r2, ?e3 WHERE 
{
    ns:m.034rd ?r0 ?e1.
    ?e1 ?r1 ?e2.
    ?e2 ?r2 ?e3.
}

I had assumed that doing this would provide me with all paths (1-, 2-, and 3-Hops) stemming from the topic entity.

(2) :

PREFIX ns:<http://rdf.freebase.com/ns/>
SELECT ?r0, ?e1 WHERE 
{
    e0 ?r0 ?e1.
}

where e0 was initially set to the topic entity. Following which the above query was run for each e1 returned by the initial query. This process was repeated 3 times (3-Hops).

I am still no closer to finding the correct way to build the subgraph and any help would be greatly appreciated.

  • to be honest, all of your approaches should work ... but, and that is the main point here, you're most likely running into the Virtuoso anytime feature, which means it returns as many as possible results found in a given time - this can be set via query or settings. I n the web interface there should also be this timeout option. – UninformedUser Jul 23 '21 at 20:07
  • Are you querying your own Virtuoso instance with the Freebase data loaded into it? If so, you can adjust Anytime Query timeouts and other query limits to suit your long-running query/ies. If not, are you sure the SPARQL endpoint is running Virtuoso? (In either case, is it publicly accessible? If so, please provide a link, and I can quickly gather the version and other details I want. If not, I'll ask some more questions.) – TallTed Jul 24 '21 at 01:51
  • @UninformedUser Thank you for your response. Do you know which variables on the server need to be changed? – user15276611 Jul 26 '21 at 08:27
  • @TallTed - Thanks for your response. It is our own Virtuoso server with freebase loaded into it. What variables should I adjust to achieve what is required? It is unfortunately not publicly available, but I am happy to try answer any questions. – user15276611 Jul 26 '21 at 08:29

1 Answers1

1

Based on the comments, I'll give some pointers here. The question as asked is not suited to a specific answer. The OpenLink Community Forum is usually better than StackOverflow for deeper dives on specific products like Virtuoso.

First and often foremost, make sure you're running the latest build of Virtuoso, whether Open Source Edition (a/k/a VOS), now 7.2.6.1 or Enterprise/Commercial Edition (a/k/a VEE or VCE), now 8.3+, both of which shipped in July 2021.

Next, take a look at the basic Performance Tuning settings, and ensure that Virtuoso is set to make use of as much RAM and other system resources as intended -- as default settings are intended to minimize Virtuoso's load on the system, not to maximize query or other performance.

Then, there is a server-side timeout, MaxQueryExecutionTime, set in the [SPARQL] section of the Virtuoso INI file, as discussed in the product documentation. Note: This timeout does not have effect on SPARQL queries that are run through an iSQL session (which just requires that you prepend the sparql keyword, and append a semicolon, to the SPARQL query you would run through the sparql form; e.g., sparql SELECT ?s ... ORDER BY ?s ;).

There are some additional Anytime Query settings that may be relevant to adjusting this feature for your deployment.

If these hints don't prove sufficient, the OpenLink Community Forum should be your next port of call for assistance.

TallTed
  • 9,069
  • 2
  • 22
  • 37
  • Thanks for your help. I believe I found where my error was occurring. It seems as though the number of rows which are 'meant' to be returned is greater than the ResultSetMaxRows in virtuoso.ini - This obviously results in an inconsistency around the expected rows. – user15276611 Jul 26 '21 at 16:40