I have a question:
I'm following this post (To use iSPARQL to compare values using similarity measures) to use SPARQL to compare values using similarity measures.
In particular, I have used this code:
select ?city ?percent where {
?city a dbpedia-owl:City ;
rdfs:label ?label .
filter langMatches( lang(?label), 'en' )
bind( replace( concat( 'x', str(?label) ), "^x[^Londn]*([L]?)[^ondn]*([o]?)[^ndn]*([n]?)[^dn]*([d]?)[^n]*([n]?).*$", '$1$2$3$4$5' ) as ?match )
bind( xsd:float(strlen(?match))/strlen(str(?label)) as ?percent )
}
order by desc(?percent)
limit 100
where I dynamically calculate this string for all my resources.
'x', str(?label) ), "^x[^Londn]([L]?)[^ondn]([o]?)[^ndn]([n]?)[^dn]([d]?)[^n]([n]?).$", '$1$2$3$4$5'
The execution time of this query is about 40 seconds with an excellent PC. When I have a big dataset, this program is impossible to use! My question is: Is there a way to optimize this query to dramatically reduce the execution time? Alternatively, how can I achieve the goal of identifying a resource like this on DBpedia in a reasonable amount of time (about 1 second)?