Clearing the internal string cache in rdf4j with sparql

Question

To avoid a possible "XY problem", let me explain my real goal: I am trying to change the capitalization of language tags in an rdf4j repo, using sparql. But although rdf4j stores language tags as written when they were defined, it knows enough to treat them as case-insensitive as the standard dictates. So it treats my attempted edit as a no-op:

Set-up:

INSERT DATA { test:a skos:prefLabel "hello"@EN }

Attempt:

DELETE { test:a skos:prefLabel "hello"@EN } 
INSERT { test:a skos:prefLabel "hello"@en }
WHERE 
{ test:a skos:prefLabel "hello"@EN }

Result:
This query does nothing. The language tag is still spelled EN. Interestingly, this also fails if I execute two separate queries:

Query 1:

DELETE DATA { test:a skos:prefLabel "hello"@EN }

Query 2:

INSERT DATA { test:a skos:prefLabel "hello"@en }

Evidently, deleted strings remain in an internal cache and are resurrected, so that my INSERT query resurrects "hello"@EN instead. A restart will clear the cache, but it's not the best UX...

Now, with some older versions of rdf4j I could clear this internal cache with the magic command CLEAR SILENT GRAPH <urn:uri:cache>. But this does not appear to work with rdf4j 2.3.3, which is what we are stuck with at the moment. Is there still a way to clear the string cache without a restart, or to change the capitalization of language tags in any other way?

PS I found this interesting thread about the handling of case in language tags; but it has brought me no closer to a solution.

Which version of RDF4J did you use this `CLEAR SILENT GRAPH ` trick in? I can't remember that ever having been a feature... — Jeen Broekstra, Dec 30 '20 at 04:47
The version this works with is supposed to be 2.3.3 -- the same version that fails to work now. My best guess is, it's actually an earlier version. I'll need to investigate what the differences are. — alexis, Jan 05 '21 at 09:13

score 1 · Answer 1 · answered Dec 30 '20 at 04:50

1

At first glance this looks like a bug to me, an unintended consequence of a fix we did donkey's years ago for allowing preservation of case in language tags (https://openrdf.atlassian.net/browse/SES-1659).

I'm not sure there are any SPARQL-only workarounds for this, so please feel free to log a bug report/feature request at https://github.com/eclipse/rdf4j/issues.

Having said that, RDF4J does have functionality for normalizing language tags of course. In particular, the RDF parsers can be configured to normalize language tags (see the Rio configuration documentation), and in addition there's a utility method Literals.normalizeLanguageTag which you can use to convert any language tag to a standard canonical form.

answered Dec 30 '20 at 04:50

Jeen Broekstra

21,642
4
51
73

Thank you! I will. Normalizing language tags before storing would definitely be the way to go, but unfortunately it is no help with earlier or current software versions that use rdf4j storage. So I'm still looking for a solution. – alexis Dec 30 '20 at 17:13
The thing is, clearing the cache did work for a long time after case preservation was introduced. Something else must have been the breaking change... But can you determine whether the cache cannot be cleared from sparql anymore, or is something else getting in the way of my approach? – alexis Dec 30 '20 at 17:30
The thing is I don't remember clearing the cache being a feature. Can you check which version of rdf4j this works for you? Hopefully that way I can reconstruct what happened to it – Jeen Broekstra Dec 30 '20 at 21:06
Thanks! I'll file a ticket.The command works in an earlier version of our product, which supposedly embeds rdf4j 2.3.3 -- the same version that fails to work now. My best guess is that it is actually a few minors earlier. I'll investigate. – alexis Jan 01 '21 at 13:04
Can you clarify if rdf4j has a single configuration point (or a specific few) for storing all language tags with canonical capitalization? I have read that the old all-lowercase behavior is available as an option, but the config docs you pointed me to seem to imply a separate parser for each format and import route. Are we at risk of missing a few the first time? – alexis Jan 01 '21 at 13:38
1

@alexis we're veering a bit off-topic here I think , but in short: you can configure each parser behavior via java system properties (see https://rdf4j.org/documentation/programming/rio/#configuration-via-command-line-switches) - this would automatically be used as the default for every parser used in your application. – Jeen Broekstra Jan 01 '21 at 23:29
I'm not 100% sure that will work in RDF4J 2.3.3 though (which is quite old, we're at 3.5.0 now). You might want to consider bumping to a more recent version. We'd be happy to advise you on any migration issues you run into. – Jeen Broekstra Jan 01 '21 at 23:32
1

I'll pass it on, thanks. I did notice the current RDF4J version and was appropriately surprised (I'm not in the developer team.) – alexis Jan 05 '21 at 09:08
And thanks, I know that was off-topic ;-) – alexis Jan 05 '21 at 09:10

Clearing the internal string cache in rdf4j with sparql

1 Answers1