2

I am trying to build a POC search engine for an internal site at my company. I am using GraphDB and have put some base data in that looks like this:

Page -hasField-> Field

Field -hasOption-> Option

Not all fields have options. All of the types have labels that I'll be searching. What I'd like to return is any Page, Field, or Option whose label contains the text, the path from the containing report to the node, and the labels for all nodes and edges along the way. So, if the page name contains the text, just the page and its label are returned. If an option contains the text Page -hasField-> Field -hasOption-> Option will be returned along with all of their labels as well. This is so that we can better describe why a page is being returned and how the user can interact with the page to get the results they desire.

In my mind, I've broken down the problem into 3 parts and have found various solutions for each, but I can't seem to tie them together. The first is to find the nodes and the report to which they belong (easily done).

The second is to find all triples along the path. While I can do this by hard-coding the relationships, I'd like to keep this generic, if possible, so that new relationships are automatically picked up as we build out the graph.

The third is to structure the results into a tree with each page as a root and the Fields and Options nested so that the application is able to display the pages as results with descriptions of each of the matching fields/options underneath. I have found that if I use CONSTRUCT with JSON-LD as the output format (with all of the triples for testing right now), this gets me most of the way there, however it doesn't build out the tree so much as list each of the nodes along with the URI for the child nodes. I've read that you can do "framing" on this, but I'm not sure if this is something that is done in SPARQL or the application.

So, to summarize my questions:

  • Is it possible to use SPARQL to get the information that I want in the format that I want (or at least most of the way there where I can easily do the rest in the application)?
  • Am I thinking about the problem correctly? If not, how should I be thinking about it?
  • Is it possible to return all of the triples that make up a path between two nodes? More specifically, if I have a list of pairs of nodes, is it possible to return all of the triples that make up all of the paths between all pairs?
  • Is it possible to take the previous result set of triples and turn that into a tree in JSON? Is it best to use "framing" or is there some other term that I should be searching for?
TallTed
  • 9,069
  • 2
  • 22
  • 37
jmblackmer
  • 1,217
  • 2
  • 8
  • 11
  • framing isn't part of the SPARQL spec so I guess you have to build the tree on the client side – UninformedUser Nov 12 '19 at 16:38
  • regarding 2) not sure what you mean by "generic" - there are SPARQL property paths, but still not sure what you're asking. It's always easier if you provide proper RDF sample data + the expected result. – UninformedUser Nov 12 '19 at 16:40
  • 1
    SPARQL generally returns tables (`SELECT`) or quads/triples (`CONSTRUCT`), which are the results of queries over quads/triples. These are not trees. It seems questionable whether you *need* a tree, though that is what you've decided you *want* -- i.e., you may currently be wrestling with an [XY Problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem#66378), and stepping back a bit to talk about what you have *at the start,* and what you want to get *in the end,* will likely get you a better net result. – TallTed Nov 12 '19 at 16:44
  • @AKSW by "generic", I mean that I may not know the exact relationship between a pair of nodes, so hard-coding "hasField" and "hasOption" will work in this early stage, but may not work later on. Basically, I can do "?a hasField ?b. ?b hasOption? ?c." but if another relationship gets added, I have to change the query. I can do "?a ?r ?b", but I can't do "?a ?r+ ?b". I could also do "?a ?r ?b. ?b ?s ?c." but if a third layer gets added then the query won't find it. So, really I think this is a problem of finding shortest path between two nodes and returning all edges and nodes along the way. – jmblackmer Nov 12 '19 at 16:48
  • 1
    @TallTed I think you may be right and I tried to phrase my question in a way that left that possibility open. The whole triplestore philosophy is a new paradigm for me and I have not completely grokked it yet. I tried to show how I'm going to about trying to solve the problem, so that maybe it would make it more obvious where I am going astray. – jmblackmer Nov 12 '19 at 16:51

1 Answers1

0

I think you should use the Lucene or Elasticsearch connector since you're facing a full-text-search problem ("label contains the text").

These connectors allow you to specify a bunch of paths to index (the "FTS molecule"). It's still a fixed set of paths though, you can't say "any props going out then branching in any direction".

To return the matching field, use Snippet extraction.

About your other question "Is it possible to return all of the triples that make up a path between two nodes?": that's not possible in SPARQL in the general case of exploring all paths (it is possible to use SPARQL Property Paths to explore certain combinations of props but you can't use variables in prop paths). Blazegraph and Stardog have some relevant extensions.

Vladimir Alexiev
  • 2,477
  • 1
  • 20
  • 31