0

I am one inch away from the solution to my problem. I am attempting title case conversion of strings retrieved via SPARQL. I am using the REPLACE function in combination with LCASE and REGEX:

BIND (replace(lcase(?label), "(\\b[a-z](?!\\s))", ucase("$1") ) as ?title_case)

lcase(?label): all characters in the string becomes lowercase

(\\b[a-z](?!\\s)): matches the first letter of each word in the string

ucase($1): is the backreference to the first letter matched, that act as replacement after turning it into UPPER case.

Expected Result: animal husbandry methods becomes Animal Husbandry Methods

That solution is working almost right, but not quite, for reasons beyond my comprehension; check here an example at work.

When you run the query you won't notice anything different in the ?title_case, but if you edit the ucase("$1") for ucase("aaa") you see it magically replacing correctly the first letter of each word:

Result: animal husbandry methods becomes AAAnimal AAAusbandry AAAethods

It seems to me the UCASE function does not have any affect on the backreference $1

Who can explain to me why so, and what is to do to rectify this behavior?

Menelao
  • 15
  • 5
  • Before turning to using `ucase($1)` I attempted a solution with `\U$1\E` that comes with REGEX, where **\U** starts the capitalization and **\E** ends it. I haven't found the right form of escaping of characters, that makes the SPARQL valid and the result to be what I expect! – Menelao Nov 24 '21 at 23:19
  • I am not really sure where you found the `\U` and `\E` commands. I cannot find them in the SPARQL specification and neither in XPath from which `replace` is taken. – IS4 Dec 06 '21 at 18:25
  • I was trying to use the \U \E method which I found explained [here](https://vim.fandom.com/wiki/Changing_case_with_regular_expressions#:~:text=This%20can%20be%20done%20easily,e%20%2C%20is%20converted%20to%20lowercase.) but then again a mix of many attempts out of frustration. – Menelao Dec 08 '21 at 15:19
  • Ah yeah that is only for vim, it seems. But it would be definitely nice to have in all regex dialects. – IS4 Dec 08 '21 at 15:55

2 Answers2

1

You can use SUBSTR{} function to solve the issue.

Eg: BIND (REPLACE(LCASE(?label), "(\\b[a-z](?!\\s))", UCASE(SUBSTR(?label, 1, 1)) ) as ?title_case)

  • 1
    THIS IS WORKING...dang I can't give you an up vote cos my reputation is low. You dropped it there like if it was nothing, I had spend hours on trying to find a solution. Thanks so much! – Menelao Oct 22 '22 at 03:03
  • @Menelao why was this the accepted solution? It takes the character of the raw label only to replace all the occurrences of start chars of the words in the label. Like "border control" Becomes "Border Bontrol" – UninformedUser Feb 13 '23 at 09:57
0

Function calls in SPARQL follow traditional conventions of most programming languages, that is that the inner functions are evaluated first, and their return values are then given as arguments to the outer function. replace here takes 3 strings, the input string, the pattern, and the replacement. ucase is interpreted independently on how the result is used, it simply converts its argument to uppercase and, surprisingly, the uppercase of $1 is $1!

In other languages, what you'd usually do is use some overload of the function that accepts a function/expression instead of the string as the replacement, so that you could call anything from within. That is not possible in SPARQL, all the replace function can do is insert the capture unmodified.

I am afraid what you want to do is not perfectly achievable in SPARQL alone. Your options are:

  • Use a SPARQL extension that contains a function that makes it possible, if supported by the endpoint.
  • If your query is a part of a larger pipeline, convert the results in another way, for example using XSLT.
  • Since you only care about [a-z], you can simply expand out all the letters and replace them one by one: replace(replace(lcase(?label), "(\\ba(?!\\s))", "A" ), "(\\bb(?!\\s))", "B" ) and so on. Not a very elegant or performant solution, but it gets the job done.
  • A shorter option is to use a pattern like ^(.*?)(\b[a-z](?!\s))(.*)$ to split the string into 3 parts, which you can extract with replacements to $1, $2 and $3, respectively. Concatenate the first part with the uppercase of the second part, and repeat the whole process for the last part. You will again have to repeat the patterns, but this time it is the same pattern so there is a potential for optimization. A downside is that you have to end this "recursion" somewhere, so you can only replace a fixed number of words.
IS4
  • 11,945
  • 2
  • 47
  • 86
  • Thanks a lot for the replay and clearing my interrogation about why that behaviour was showing. Indeed I was working under the impression that the replacement with REGEX was somehow handled with the same execution logic of REGEX outside SPARQL. I did try to use GraphDB implementation of SPARQL which even has a spif:titleCase() function, but also there I have an unexpected behaviour. I posted a [question](https://stackoverflow.com/questions/70112801/case-conversion-graphdb-spiftitlecase-unexpected-behaviour) hoping that Ontotext folks would explain. – Menelao Dec 08 '21 at 15:18
  • At the beginning I thought I understood your reasoning, but on a second read I still can't get why if $1 is holding the first letter, the ucase() is not capitalising it, even if it is the first executed func. You say ucase() does capitalise $1, but the result is not used in the replace()? I am puzzled because the ucase('aaa') does work as expected, makes me understand that replace does use that for replacement. Thanks again for your time. – Menelao Dec 08 '21 at 15:39
  • I did try to use GraphDB implementation of SPARQL which even has a spif:titleCase() function, but also there I have an unexpected behaviour. I posted a [question](https://stackoverflow.com/questions/70112801/case-conversion-graphdb-spiftitlecase-unexpected-behaviour) hoping that Ontotext folks would explain. – Menelao Dec 08 '21 at 15:39
  • @Menelao `"$1"` is only a string of two characters. `ucase` converts them both to uppercase, and the result, happenning to still be `"$1"`, gets passed to `replace`, which has no idea what it went through before that. The question you linked suffers from the same confusion. There is nothing syntactically special about `"$1"` from SPARQL's viewpoint, it's just a string governed by the same rules other strings are. – IS4 Dec 08 '21 at 15:59
  • I really don't mean to waste your time, but when you visit the link to the SPARQL endpoint and run the query, if as you say $1 is simply a string, shouldn't then I receive a label replaced with $1 as first letter (e.g. `$1nimal` `$1usbandry` `$1ethods`)? Why instead do I see the first letter correctly placed? Why if I put ucase("$1.") I see the first letter separated from a DOT from the rest of the label string? In other words why ucase() has no effect? – Menelao Dec 08 '21 at 22:25
  • @Menelao Well because the string has a format that is understood by `replace`. The `$` character is perfectly normal in a string as far as SPARQL is considered, but `replace` treats it differently. – IS4 Dec 08 '21 at 22:34
  • Ok I have rebuilt in my mind how it is working, thanks a lot again! – Menelao Dec 09 '21 at 11:03