0

my question is about how to manage stemmed words in SPARQL queries. For instance:

  • string inserted by user: bbbx
  • stemmed word: bbb
  • word to retrieve in my graph: bbby

This could by pretty easy.

Second example, both words should be fetched, not one OR other

  • string inserted by user: bbbx cccn
  • stemmed word: bbb ccc
  • words to retrieve in my graph: bbby ccct

Third and most tricky, where words describing the nodes aren't at the first beginning of descriptors

  • string inserted by user: bbbx dddz
  • stemmed word: bbb ddd
  • word in my graph that contains the corresponding words: aaaw bbbt cccq dddr

consider that I couldn't use particular API because I have to perform queries only by submitting SPARQL query via PHP on a shared hosting, querying public repositories like DbPedia and so on.

Thanks in advance for help

RobMor
  • 57
  • 10
  • I'm not sure what you're trying to achieve in the end, probably there already solutions out there. I guess you would need some full-text index on the resources in the graph, e.g. by using Lucene. I think with REGEX and some string functions this is mostly a) not possible or b) much too slow. – UninformedUser Nov 20 '15 at 09:51
  • fine but I couldn't use any languages than PHP. There's not Tomcat, or similar, available on my server. – RobMor Nov 20 '15 at 10:04
  • Is the stem always the first three letters? Some more diverse examples might help here. How are you determining what the stem is? Once you have that, the query is pretty easy, though, as AKSW says, it might be slower than you'd like, depending on how your data is organized. – Joshua Taylor Nov 20 '15 at 13:48
  • No it isn't. I used three letters only for simlicity, it could be as long as you want, instead of same letters there should be real words. The stemmer is based on snowball. As you see in the third example, here expanded: string inserted by user: bbbbbbx ddddz stemmed word: bbbbbb dddd word in graph that contains what asked: aaw bbbbbbt cq ddddr User is asking for **'bbbbbbx ddddz'** and the words could be retrieved in the string **'aaaw bbbbbbt cccq ddddr'** that exactly contains what requested but doesn't necessary starts with the first or has the second in the position submitted by user. – RobMor Nov 20 '15 at 14:47
  • For instance if you look for: **Dogs Healths and Conditions**. Stemmer outputs **Dog Health Condition**. Let's immagine in my repository there's a node with **Symptoms on Dogs and Cats Health Conditions**, this URI should be returned – RobMor Nov 20 '15 at 15:56
  • If you can precompute the stems, then you could do this pretty efficiently. – Joshua Taylor Nov 20 '15 at 22:05
  • What do you mean by precomputing ? What I'm looking for is the SPARQL query – RobMor Nov 21 '15 at 08:32

0 Answers0