0

I am querying a DataCube (RDF statistical data) that has 3 dimension and 1 measure. In this datacube, each observation is then composed of 4 statements (3 for the dimensions and 1 for the measure like the following exemple (that may be queried at http://kaiko.getalp.org/sparql).

SELECT distinct ?version ?lg ?relation ?count WHERE {
  ?o a qb:Observation; 
    qb:dataSet dbnstats:dbnaryNymRelationsCube;
    dbnary:wiktionaryDumpVersion ?version;
    dbnary:observationLanguage ?lg;
    dbnary:nymRelation ?relation;
    dbnary:count ?count.
} 

The query returns something like:

version lg relation count
"20210601" "id" antonym 4
"20210601" "id" approximateSynonym 0
"20210601" "id" hypernym 0
"20210601" "id" synonym 108
"20150602" "id" antonym 2
"20150602" "id" approximateSynonym 0
"20150602" "id" hypernym 0
"20150602" "id" synonym 36
"20150702" "id" antonym 2
"20150702" "id" approximateSynonym 0
"20150702" "id" hypernym 0
"20150702" "id" synonym 36

I'd like to pivot on the value of the relation to get the following table:

version lg antonym approximateSynonym hypernym synonym
"20210601" "id" 4 0 0 108
"20150602" "id" 2 0 0 36
"20150702" "id" 2 0 0 36

I could not find a way to craft a single SPARQL query to get this. Currently, I need to fetch all data and make the pivot using whatever client language I use (here python).

Is this possible in SPARQL 1.1 ? How ?

I'd rather have a general answer, but the access point is currently served by Virtuoso.

Edit: To better explain my expectation. In DataCube Vocabulary, the structure of a DataCube is described that gives the different dimensions and measure (usually by the ontology). Hence, the dimensions and measures are considered to be known by the query developper (at least for a specific version of the ontology).

Here, the values of the nymRelation are not known in advance, they are part of the data and not of the structure. The Pivot operation seems to be a valid operation on a DataCube (along with slicing, projecting, etc.).

I would like to know if such an operation could be made on the server (through a generic query that will not depend on the actual data on the server). This would make it possible for a client to maintain a LAZY datacube object and postpone actual pivot operation when the results are indeed necessary.

I suspect (and first answers seems to imply) that this operation is not possible without either fetching the entire DataCube (to perform the operation in memory on client side) or fetching the actual distinct property values and automatically crafting a query that will depend on this first result.

dodecaplex
  • 1,119
  • 2
  • 8
  • 10
  • a `PIVOT` operator is neither part of SPARQL nor is it even part of the even older and more prominent `SQL` for relational databases. IF such a feature exists, it's database specific and nice to have - but not part of a standard. My suggestion, query the data and then look for some client side solution. For example, YASGUI client supports it in the browser via Javascript for the fetched result, i.e. it's a post-processing feature of the resultset - but hey, it works: https://yasgui.triply.cc/ – UninformedUser Jun 22 '21 at 16:34

1 Answers1

2

You need to combine values from distinct observations. If hard-coding the relation names in the queries is not too impractical, you can write separate SELECT statements that bind a common value for ?version and ?lg to pull the counts into a single solution, like this:

SELECT ?version ?lg ?antonym ?approximateSynonym # ...
WHERE {
  {
    SELECT ?version ?lg ?antonym
    WHERE
    {
      ?o1 a qb:Observation; 
        qb:dataSet dbnstats:dbnaryNymRelationsCube;
        dbnary:wiktionaryDumpVersion ?version;
        dbnary:observationLanguage ?lg;
        dbnary:nymRelation dbnary:antonym;
        dbnary:count ?antonym .   # <--- bind the antonym count value
    }
  } 

  {
    SELECT ?version ?lg ?approximateSynonym 
    WHERE
    {
      ?o2 a qb:Observation; 
        qb:dataSet dbnstats:dbnaryNymRelationsCube;
        dbnary:wiktionaryDumpVersion ?version;
        dbnary:observationLanguage ?lg;
        dbnary:nymRelation dbnary:approximateSynonym;
        dbnary:count ?approximateSynonym .   # <--- bind the approximateSynonym count
    }
  }

  # ... And so on for the other columns 
}

This requires that all statistics are present for each version/language combination; otherwise there will be no solution for that combination.

Alternative

If there are too many relation types, you can use the following CONSTRUCT query to aggregate the equivalent of each row into its own observation-like object. The different properties will be mapped to the same ?rowURI. You can parse this result as RDF, or just deal with a json serialization if you prefer.

CONSTRUCT {
    ?rowURI 
       dbnary:wiktionaryDumpVersion ?version ;
       dbnary:observationLanguage ?lg ;
       ?relation ?count 
}
WHERE {
  ?o a qb:Observation; 
    qb:dataSet dbnstats:dbnaryNymRelationsCube;
    dbnary:wiktionaryDumpVersion ?version;
    dbnary:observationLanguage ?lg;
    dbnary:nymRelation ?relation;
    dbnary:count ?count.

    BIND(URI(CONCAT("http://example.org/row/", ?lg, ?version)) AS ?rowURI)
} 
alexis
  • 48,685
  • 16
  • 101
  • 161
  • Well that is exactly what I want to avoid as I don't want my query to depend on a specific set of *values* for a specific property. Any solution where the query explicitly mentions the names of the *values* of the `nymRelation` property is outside of my expectation (I know I can ask for all values then craft a query with all (value derivated) variable names, but I'd like the query to remain general and if possible use a single query execution. – dodecaplex Jun 22 '21 at 14:07
  • I agree, it would be nicer to have a solution that generates the columns dynamically, but there is no such thing in sparql (or in standard sql), afaik. This answers gives you the results you want in a single sparql query, in standard sparql (which is what you requested in your question). Would a CONSTRUCT query instead of SELECT be an option? That can get you the information you want without enumerating the values of `dbnary:nymRelation` in the query. – alexis Jun 22 '21 at 14:19
  • Can you elaborate on the CONSTRUCT option ? – dodecaplex Jun 22 '21 at 17:21