2

I'm trying to test if a string is contained in another in a case-insensitive way. The SPARQL expression

REGEX ( ?str, ?tok, "iq" )

should do that. The q flag is needed in addition to the i flag because ?tok may have regex special characters such as [ or \. The REGEX should behave just like XQuery fn:matches function. However, including the q flag simply makes the expression return always false. [update: it does not return false, it does not return a value at all]

Is this an issue in the GraphDB implementation I'm using or have I misinterpreted the standard? Has anyone observed this oddity in other SPARQL implementations?

I can work around this specific case by replacing it with

CONTAINS ( LCASE(?str), LCASE(?tok) )

but other scenarios which may need flags x, s and m do not seem to have an easy replacement. [update: the other flags all work correctly, only q is broken]

[note: the workaround with lcase() is suboptimal; regex should be more efficient.]

How do other triple stores behave in this respect?

TallTed
  • 9,069
  • 2
  • 22
  • 37
Jaccoud
  • 99
  • 1
  • 4
  • Not sure, I tried with Virtuoso and `q` flag seems to be ignored as well : `select * where { VALUES ?str {"abcd"} VALUES ?pattern {".*"} VALUES ?flags {"i" "iq"} BIND(REGEX ( ?str, ?pattern, ?flags ) as ?matches) }` – UninformedUser May 15 '20 at 11:28
  • In GraphDB, it is not ignored. Flags i, s, m and x are behaving as expected, but flag q — alone or in conjunction with others — makes any pattern utterly unmatchable. In fact, I tried inspecting the regex return value as you did, and it doesn't even return false, it simply fails to execute, returning nothing at all. I suspect it raises an exception, but the main log reveals nothing. – Jaccoud May 15 '20 at 14:14
  • Hm. I managed to try it in Jena TDB/Fuseki and it worked as expected. Looks like it is really a GraphDB issue. A appreciate feedback from other triple stores users. Thx – Jaccoud May 15 '20 at 14:41
  • 2
    I had a quick look and there turns out to be a bug in the RDF4J SPARQL engine for regex evaluation, which causes the 'q' flag to not be recognized. See https://github.com/eclipse/rdf4j/issues/2224 . I'm suspecting that GraphDB relies on RDF4J's implementation for regex handling, and therefore has the same problem. – Jeen Broekstra May 15 '20 at 23:54
  • 2
    Fix is scheduled for the next RDF4J patch release. – Jeen Broekstra May 16 '20 at 00:24
  • 1
    @JeenBroekstra fast support as usual, very impressive. And to be honest, I wasn't even aware of such a flag `q` until now. Never digged into the REGEX specs of XPath, always happy to learn. Cheers – UninformedUser May 16 '20 at 03:29
  • Thanks, guys, for clarifying, an Jeen for the promp fix. – Jaccoud May 18 '20 at 15:31
  • I didn't know REGEX should support `qismx`, thanks! But why do you think that "regex should be more efficient" than `lcase`? – Vladimir Alexiev Sep 18 '20 at 09:30
  • @UninformedUser — Would you mind [logging the issue to the Virtuoso project on github](https://github.com/openlink/virtuoso-opensource/issues)? If you still have your test rig for this (which I presume is more than the query in the comment above), that would be quite helpful to include! – TallTed Oct 08 '20 at 14:20
  • Alexiev, the solution with contains() must first convert both strings with lcase and then compare the resulting strings, while a well implemented regex() can convert the characters just as needed (short-circuiting results) or even apply some more smart optimization. To compare Unicode strings you also need do normalise on.the.fly, and this is not a simple or cheap task. Surely not every regex implementation is well optimized, but they should be — it affects performance when dealing with long strings. Most C/C++ implementations would use IBM's ICU to do this; Java has it built-in. – Jaccoud Oct 21 '20 at 17:07

1 Answers1

0

This is a compliance problem in RDF4J, which is fixed in upcoming version 3.2.2. GraphDB, which uses RDF4F, shall incorporate the correction in sequence. The Jena/Fuseki implementation is already compliant in this repect. The Virtuoso implementation seems to be broken as well (flag is ignored). There were no conformance reports for other implementations.

Jaccoud
  • 99
  • 1
  • 4
  • Jaccoud — Would you mind [logging the issue to the Virtuoso project on github](https://github.com/openlink/virtuoso-opensource/issues)? If you still have your test rig for this (which I presume is more than the query in @UninformedUser's comment above), that would be quite helpful to include! – TallTed Oct 08 '20 at 14:22