Is there a way to index while performing REGEX_TEST() on a string to field to retrieve documents in ArangoDB?
Also if there is any way to optimize this please let me know
Is there a way to index while performing REGEX_TEST() on a string to field to retrieve documents in ArangoDB?
Also if there is any way to optimize this please let me know
There is no index acceleration available for the REGEX_TEST()
AQL function, and it is unlikely to come in the future. Not because there is no interest from users and developers, but because it's not really possible to build any sort of index data structure that would allow to speed up regular expression evaluation.
Regular expressions as supported by ArangoDB allow for many different types of expressions, but because they can differ so much, there is almost no chance to have a suitable index. For equality comparisons there are hash indexes, which are probably the fastest kind of index. For range queries there are skiplist indexes, and there are of course quite a few more index types known in computer science, but I'm not aware of a single one that could speed up arbitrary regex.
If your expression allows, maybe there is a chance add a filter criterion before REGEX_TEST()
which might utilize an index? This will mostly be limited to case-sensitive prefix matching, e.g. FILTER REGEX_TEST(doc.str, "a[a-z]*")
could be extended to FILTER doc.str >= "a" AND doc.str < "b" AND REGEX_TEST(doc.str, "a[a-z]*")
and allow for a skiplist index being used to only evaluate the regex on documents where str
starts with a
. Or some simple regex like [fm]oo|bar
could be rewritten to a set of equality comparisons: FILTER doc.str IN ["foo","moo","bar"]
. Also have a look at ArangoSearch.