0

I encountered a weird search behaviour while exploring opensearch's search engine. All the records within my index are as follows:

[{
        "_index": "table1",
        "_id": "AO4AnIYBC-oD5gl3Hm7W",
        "_score": 1,
        "_source": {
            "period": "JAN-23",
            "requestID": "10273376",
            "header": {
                "period": "JAN-23"
            },
            "status": "Complete"
        }
    },
    {
        "_index": "table1",
        "_id": "gavmkoYB7MbgbX172uOM",
        "_score": 1,
        "_source": {
            "period": "JAN-23",
            "requestID": "100138128",
            "header": {
                "period": "JAN-23"
            },
            "status": "Complete"
        }
    },
    {
        "_index": "table1",
        "_id": "g6vnkoYB7MbgbX17POOY",
        "_score": 1,
        "_source": {
            "period": "FEB-23",
            "requestID": "10246457",
            "header": {
                "period": "FEB-23"
            },
            "status": "Complete"
        }
    },
    {
        "_index": "table1",
        "_id": "hKvnkoYB7MbgbX17XeOw",
        "_score": 1,
        "_source": {
            "period": "JAN-23",
            "requestID": "10273941",
            "header": {
                "period": "JAN-23"
            },
            "status": "Complete"
        }
    },
    {
        "_index": "table1",
        "_id": "_-7nkoYBC-oD5gl3TW1Z",
        "_score": 1,
        "_source": {
            "period": "FEB-23",
            "requestID": "10254951",
            "header": {
                "period": "FEB-23"
            },
            "status": "Complete"
        }
    },
    {
        "_index": "table1",
        "_id": "gqvnkoYB7MbgbX17JONH",
        "_score": 1,
        "_source": {
            "period": "JAN-23",
            "requestID": "10273376",
            "header": {
                "period": "JAN-23"
            },
            "status": "Complete"
        }
    }
]


Here are some of the results that I am getting When querying this data

Query 1: Returns Correct Info

GET /table1/_search
{
  "query": {
    "match": {
      "status": "Complete"
    }
  }
}

Query 2: Returns all records, which is wrong ideally it should only return 4 records

GET /table1/_search
{
  "query": {
    "match": {
      "period": "JAN-23"
    }
  }
}

Query 3: Returns 4 records with period : JAN-23 which is again wrong as it should now return 0 records

GET /table1/_search
{
  "query": {
    "match": {
      "period": "JAN-22"
    }
  }
}

Query 4: Returns all records, which is again wrong, as it should return 0

GET /table1/_search
{
  "query": {
    "match": {
      "period": "DEC-23"
    }
  }
}

It would be really helpful if anyone can help me understand why is it so?

Thanks

Diksha Goyal
  • 260
  • 6
  • 19
  • Can you add the query results? I suspect you are getting the "wrong" results back, albeit with a low score. did you try using a boolean query instead of the "match"? – A.Stern Mar 07 '23 at 14:04
  • Yes, I tried the boolean query but got no luck. In order to avoid duplicity I didn't provide query results instead added a description to each – Diksha Goyal Mar 07 '23 at 14:56

1 Answers1

1

The field period in the index table1 is a text due to which the value of this filed will be analysed and converted into multiple tokens, like JAN-23 will be converted to jan and 23, so while querying "period": "JAN-23", all the documents whose period field contains jan or 23 are returned. To search a text exactly, we can use term query to search.

Mapping

"period": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }

Term Query which returned accurate results

GET /table1/_search
{
   "query": {
    "term": {
      "period.keyword": {
        "value": "DEC-22"
      }
    }
   }
}

Diksha Goyal
  • 260
  • 6
  • 19