How I can optimize a query to compare values using similarity measures

Question

I have a question:

I'm following this post (To use iSPARQL to compare values using similarity measures) to use SPARQL to compare values using similarity measures.

In particular, I have used this code:

select ?city ?percent where {
  ?city a dbpedia-owl:City ;
        rdfs:label ?label .
  filter langMatches( lang(?label), 'en' )

  bind( replace( concat( 'x', str(?label) ), "^x[^Londn]*([L]?)[^ondn]*([o]?)[^ndn]*([n]?)[^dn]*([d]?)[^n]*([n]?).*$", '$1$2$3$4$5' ) as ?match )
  bind( xsd:float(strlen(?match))/strlen(str(?label)) as ?percent )
}
order by desc(?percent)
limit 100

where I dynamically calculate this string for all my resources.

'x', str(?label) ), "^x[^Londn]([L]?)[^ondn]([o]?)[^ndn]([n]?)[^dn]([d]?)[^n]([n]?).$", '$1$2$3$4$5'

The execution time of this query is about 40 seconds with an excellent PC. When I have a big dataset, this program is impossible to use! My question is: Is there a way to optimize this query to dramatically reduce the execution time? Alternatively, how can I achieve the goal of identifying a resource like this on DBpedia in a reasonable amount of time (about 1 second)?

**My question is: Is there a way to optimize this query to dramatically reduce the execution time of it? Alternatively, how can I achieve the goal of identifying a resource like this on DBpedia in a reasonable amount of time (about 1 second)?** That's really two questions, and both are too broad for Stack Overflow. Some SPARQL endpoints support text indexing (e.g., Jena can use Lucene, and I think Virtuoso has some text indexing capabilities). I expect that if you want efficient text operations, you'll have do something with one of those. — Joshua Taylor, Sep 29 '14 at 14:49
Stack Overflow isn't a replacement for a search engine or for documentation. The first page of resultson Google for [`jena lucene`](https://www.google.com/search?q=jena+lucene&oq=jena+lucene) look relevant. — Joshua Taylor, Sep 29 '14 at 15:07
I never considered Stack Overflow a replacement for a search engine or for documentation. I thought you already had a ready solution so I asked ... — Musich87, Sep 29 '14 at 15:16
http://stackoverflow.com/questions/17954399/creating-a-lucene-index-for-an-existing-apache-jena-tdb-to-implement-text-search and http://stackoverflow.com/questions/17111903/building-fulltext-search-index-for-jena-and-lucene may be useful. — Joshua Taylor, Sep 29 '14 at 15:34

How I can optimize a query to compare values using similarity measures

0 Answers0