1

I have a column of URIs from different domains. Example,

http://comicmeta.org/cbo/category
http://purl.org/dc/terms/hasVersion
http://schema.org/contributor

and so on. I want to extract the last part, i.e, the string after the last slash '/' on each such URI.

Expected results on the above list of URIs:

category
hasVersion
contributor

How do I write a generic SPARQL query to extract this last part from any given URI?

This is what I have tried so far:

SELECT distinct ?s ?x WHERE { 
    ?s ?p ?o .
    BIND (STRBEFORE(STRAFTER(STR(?s),"/"), " ") as ?x) .
    #To extract the part after the slash '/' and before the end of string indicated by a space ' '. 

}

But, this only returns empty strings "".

How can I make this work? Can someone help me with this?

AnonymousMe
  • 509
  • 1
  • 5
  • 18
  • 1
    that is not possible generically - the common way is to use just `strafter()` with a know namespace as second argument - clearly, this only works if you know all namespaces in advance. – UninformedUser Jul 06 '22 at 10:50
  • 1
    also, `strafter` takes the substring after the first occurrence of the string - which is why your `STRAFTER(STR(?s),"/")` is useless - and there is no reverse operation in sparql – UninformedUser Jul 06 '22 at 10:52
  • given that you'Re using GrpahDB, my suggestion would be to check for any helpful functions, see https://graphdb.ontotext.com/documentation/10.0/sparql-functions-reference.html#sparql-spin-functions-and-magic-predicates and if there is no such function simply register your own extension function in JS language: https://graphdb.ontotext.com/documentation/10.0/javascript-functions.html – UninformedUser Jul 06 '22 at 10:54
  • See also the "alternative" for `afn:localname()` here: https://jena.apache.org/documentation/query/library-function.html – Stanislav Kralin Jul 06 '22 at 16:04
  • yeah - extension functions do exists because of SPARQL specs - I didn't mention other triple stores as the question is tagged with `graphdb` (not sure if this holds though or if it could be to other triple stores like Jena) – UninformedUser Jul 07 '22 at 05:23
  • STRBEFORE and STRAFTER test the string and return a boolean. Try using REPLACE which can be used to remove a matching regex pattern. – AndyS Jul 07 '22 at 14:12

1 Answers1

2

Using REPLACE is the way:

BIND (REPLACE(STR(?s), "^.*/([^/]*)$", "$1") as ?x)

This replaces the whole string with only the part found after the last / character. Note however that, due to your examples, not all vocabularies use / as the delimiter; some also use #. Something like http://www.w3.org/1999/02/22-rdf-syntax-ns#type will be turned into 22-rdf-syntax-ns#type

If you do not want that, you could use something a bit more complicated:

BIND (REPLACE(STR(?s), "^.*?([_\\p{L}][-_\\p{L}\\p{N}]*)$", "$1") as ?x)

This selects the longest part from the end based on what usually is a valid XML name.

IS4
  • 11,945
  • 2
  • 47
  • 86