How to search for a case insensitive substring in an array of objects

Question

In ArangoDB I am playing around with a test collection that is the IMDB dataset downloaded from their site as csv. The movies document is structured as follows:

movies:
{
    _key: 123456,
   name: "Movie title",
   ... ,
   releases: [
      { title: "Local title",
        region: 'US',
        language: 'en',
        ... },
      { title: "Other title",
        region: 'GB',
        language: '??'
        ... }
   ]
}

I have created an index on the movies.releases[*].title field. I am interested in querying that field, not only by equality, but also by using case insensitive and substring matching.

The problem is that the only kind of query that uses the index is when I do something like that:

FOR doc IN movies:
    FILTER 'search' IN doc.releases[*].title

With this I can only match the whole string in a case sensitive way: how can I look for a substring in a case insensitive way?

I cannot use a full-text index, since ArangoDB does not support it in arrays, and I cannot use LOWER() and CONTAINS() since it is an array.

Any ideas?

Thanks!

score 0 · Answer 1 · answered Mar 24 '20 at 08:25

It's possible to nest your search, giving you the power to search within the array without having the constraints applied by using the '[*]' notation.

Here is an example that does a search inside each releases array, looking for a case insensitive match, and then returning if it gets any hits.

The FILTER function there will only return the movie if at least one of the releases has a match.

FOR doc IN movies
    LET matches = (
        FOR release IN doc.releases
            FILTER CONTAINS(LOWER(release.title), LOWER('title'))
            RETURN release
    )
    FILTER LENGTH(matches) > 0
RETURN doc

It's straight forward there to change 'title' to a parameter.

Note: To put less pressure on the query, the goal of the matches variable is to have a LENGTH property greater than 0 if there is a release with your key word in it.

The function above has the line RETURN release which returns possibly a large amount of data when you won't be reading it, so an alternative there is to replace that line with RETURN true as that is all that is needed to force matches to become an array and have a LENGTH greater than 0.

Thanks, that's a nice way to work with array elements. The only thing is that the index is not used, so the query will have to scan the whole collection. Since I created an index on that collection.releases[*].title field, is there any way to load an index and directly query against it? That way you would not need to access the collection, just query directly the index. Maybe a feature for the future? — Milko, Mar 24 '20 at 13:06

How to search for a case insensitive substring in an array of objects

1 Answers1