3

I am querying vespa to check if a particular userId is present in an array of userIds. http://localhost:8080/search/?yql=select * from sources doc where userIds contains 'user1';

Search Definition:

search doc {
    document doc {
        field userIds type array<string> {
            indexing : index | summary
        }
        field doctype type string {
            indexing : summary
        }
}

Sample Response:

{
"children": [{
        "id": "id:doc:doc::0",
        "fields": {
            "userIds": ["user1", "user2", "user3"],
            "doctype": "type1"
        }
    },
    {
        "id": "id:doc:doc::1",
        "fields": {
            "userIds": ["user1", "user3"],
            "doctype": "type2"
        }
    }
]}

When I remove an element ("user1") from the array, I am still getting the hits in response, even when it is being succesfully removed from the array.

Update API:

PUT http://localhost:8080/document/v1/doc/doc/docid/0
{
"update": "id:doc:doc::0",
"fields": {
    "userIds[0]": {
        "remove": 0
    }
}
}

GET http://localhost:8080/document/v1/doc/doc/docid/0
{"fields": {
        "userIds": ["user2", "user3"],
        "doctype": "type1"
    }
}

Even after the above userIds field is updated, the same query

http://localhost:8080/search/?yql=select * from sources doc where userIds contains 'user1';

gives the response,

{"children": [{
    "id": "id:doc:doc::0",
    "fields": {
        "userIds": ["user2", "user3"],
        "doctype": "type1"
    }
},
{
    "id": "id:doc:doc::1",
    "fields": {
        "userIds": ["user1", "user3"],
        "doctype": "type2"
    }
}]}

In the above respone, there is no "user1" in the userIds array of "id:doc:doc::0". But, still the query gives it as a hit. Please help.

Edit-1: Note that, when I assign a new array with the element removed, it works correctly

PUT http://localhost:8080/document/v1/doc/doc/docid/0
{
"update": "id:doc:doc::0",
"fields": {
    "userIds": {
        "assign": ["user2", "user3"]
    }
}
}

The above Update API gives the expected hits in response, for the query. But, as I am calling the Update API from within a Searcher, I am getting a huge response time lag. (To create a new Array Object and assign to the userIds field, as the array grows to a big size of about 50000)

Please, tell me why the remove option is failing. I really need to improve the query performance, by using it.

Edit-2: The following syntax, mentioning the element to be removed for updating the array works correctly. Thanks to @Jo's comment.

PUT http://localhost:8080/document/v1/doc/doc/docid/0
{
"update": "id:doc:doc::0",
"fields": {
    "userIds": {
        "remove": ["user1"]
      }
}
}

Note that the above syntax removes all the occurrences of the element specified.

  • There might be a bug here - most applications using structured fields use attribute, not index. I'll follow up and update here next week. – Jon Jan 25 '19 at 15:28
  • @Jon But, indexing the array as attribute, won't allow me to run the 'contains' query, which currently is suiting my use case and working well too. Please, reply me as you get to know about this probable bug with the **remove** option. Thanks. – Vikrant Thakur Jan 28 '19 at 07:29
  • 1
    Thanks for the detailed description. I'm able to reproduce and created https://github.com/vespa-engine/vespa/issues/8258 to track it. As you can see there exist a workaround using a different syntax which might work for you? – Jo Kristian Bergum Jan 28 '19 at 11:59
  • @JoKristianBergum Yes, the second syntax you mentioned is working correctly. Thank you so much for your help. – Vikrant Thakur Jan 28 '19 at 12:41
  • Note that string with index by default has match:text which means tokenization so you might get matches across different elements if your input text is tokenizable. If you really have user id's I would consider adding match:exact or exact:word. – Jo Kristian Bergum Jan 28 '19 at 14:20

1 Answers1

1

(Summary of the discussion above to provide an answer for the record)

Removing array elements by index is not supported, use remove by value instead:

{
"update": "id:doc:doc::0",
    "fields": {
        "userIds": {
            "remove": ["user1"]
          }
  }
}
Jon
  • 2,043
  • 11
  • 9