1

I'm having trouble with ArangoSearch.

Here is some dummy data that I have in a collection called things (for simplicity I have removed each of their "_id", "_key" and "_rev" properties):

{"text":"eat a cookie"}
 
{"text":"I like cookies"}
 
{"text":"Timmy how are u"}
 
{"text":"I read a book on elves"}

And I have a view that looks like this (I am calling it practice):

{
  "writebufferIdle": 64,
  "type": "arangosearch",
  "primarySortCompression": "lz4",
  "links": {
    "things": {
      "analyzers": [
        "text_en",
        "identity"
      ],
      "fields": {
        "text": {
          "analyzers": [
            "text_en"
          ]
        }
      },
      "includeAllFields": true,
      "storeValues": "none",
      "trackListPositions": false
    }
  },
  "primarySort": [],
  "writebufferSizeMax": 33554432,
  "consolidationPolicy": {
    "type": "tier",
    "segmentsBytesFloor": 2097152,
    "segmentsBytesMax": 5368709120,
    "segmentsMax": 10,
    "segmentsMin": 1,
    "minScore": 0
  },
  "cleanupIntervalStep": 2,
  "commitIntervalMsec": 1000,
  "storedValues": [],
  "id": "138993",
  "globallyUniqueId": "h23A40B2F96C2/138993",
  "writebufferActive": 0,
  "consolidationIntervalMsec": 1000
}

When I do an AQL search like follows, it correctly returns 4:

FOR docs IN practice COLLECT WITH COUNT INTO num RETURN num

But when I do an AQL search like this, I mostly get empty arrays:

FOR doc IN practice
SEARCH ANALYZER(doc.text == "cookie", "text_en")
RETURN doc

(weirdly, there is a word or two that works with the above but a majority don't - for example, "cookie" returns an empty array but "how" returns one match)

Any idea what I am doing wrong?

Thanks

green
  • 77
  • 7

1 Answers1

0

The indexed text field has text_en processing applied but you aren't applying it to the search term.

ANALYZER(doc.text == "cookie", "text_en")

The ANALYZER() function only selects the analyzer for the indexed data here.

Depending on how the analyzer transforms the stored attribute values, there can be a mismatch because of stemming. All of the built-in text analyzers have stemming enabled.

Try RETURN TOKENS("cookie", "text_en") to see what the analyzer does to the word.

This should find two things:

ANALYZER(doc.text == TOKENS("cookie", "text_en")[0], "text_en")

CodeManX
  • 11,159
  • 5
  • 49
  • 70
  • What would I do if I wanted to match multiple terms. ```TOKENS("cookie", "text_en")[0]``` would match one word ('cookie' in this case), but what if I wanted to search a text that had two or more specific words (such as "cookie" and "I")? – green Nov 11 '21 at 15:44
  • `ANALYZER(TOKENS("cookie monster", "text_en") ALL IN doc.text, "text_en")` (or `ALL ==` which is the same as `ALL IN`). Note that the call to TOKENS() comes first. Also see https://www.arangodb.com/docs/stable/arangosearch-fulltext-token-search.html – CodeManX Nov 19 '21 at 23:01