0

I think I have blinded myself staring at an error over and over again and could really use some input. I have a time-series set of documents. Now I want to find the five documents following a specific id. I start by fetching that single document. Then fetching the following five documents without this id:

var documents = client.Search<Document>(s => s
    .Query(q => q
        .ConstantScore(cs => cs
            .Filter(f => f
                .Bool(b => b
                    .Must(must => must
                        .DateRange(dr => dr.Field(field => field.Time).GreaterThanOrEquals(startDoc.Time))
                    .MustNot(mustNot => mustNot
                        .Term(term => term.Id, startDoc.Id))
                    ))))
    .Take(5)
    .Sort(sort => sort.Ascending(asc => asc.Time))).Documents;

My problem is that while 5 documents are returned and sorted correctly, the start document is in the returned data. I'm trying to filter this away with the must not filter, but doesn't seem to be working. I'm pretty sure I have done this in other places, so might be a small issue that I simply cannot see :)

Here's the query generated by NEST:

{
   "query":{
      "constant_score":{
         "filter":{
            "bool":{
               "must":[
                  {
                     "range":{
                        "time":{
                           "gte":"2020-08-31T10:47:12.2472849Z"
                        }
                     }
                  }
               ],
               "must_not":[
                  {
                     "term":{
                        "id":{
                           "value":"982DBC1BE9A24F0E"
                        }
                     }
                  }
               ]
            }
         }
      }
   },
   "size":5,
   "sort":[
      {
         "time":{
            "order":"asc"
         }
      }
   ]
}
ThomasArdal
  • 4,999
  • 4
  • 33
  • 73
  • Unless I'm missing something here, why not just use `GreaterThan` rather than `GreaterThanOrEquals` and then omitting the start doc? - https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/date-range-query-usage.html#_object_initializer_syntax_example_40 – Sai Gummaluri Aug 31 '20 at 18:26
  • That's a good point. But there could be other documents with the same date and time. – ThomasArdal Aug 31 '20 at 19:40
  • Understood. I am in the process of setting up a sample to reproduce your scenario. In the meantime, can you share the query that the `NEST` client is generating in your case? Have you already taken a look at it? – Sai Gummaluri Aug 31 '20 at 20:45
  • Also, with respect to the mapping, is the id field analyzed by any chance? – Sai Gummaluri Sep 01 '20 at 05:19
  • 1
    You nailed it! The id field is of type `text` and I do now remember having a similar problem in the past. I even created a subfield named `id.keyword` of type `keyword` (getting old ‍♂️) and after switching to that, everything works as expected. Thank you so much. Feel free to copy your comment as an answer and I will mark is as resolved – ThomasArdal Sep 01 '20 at 05:42
  • I'm glad it helped :) I've added the same as an answer for the reference of other searchers who might end up here with a same/similar issue. – Sai Gummaluri Sep 01 '20 at 06:56

1 Answers1

1

This could be happening because the id field might be an analyzed field. Analyzed fields are tokenized. Having a non-analyzed version, for exact match (like you mentioned in the comments, you have one) and using it within your filter will fix the difference you are seeing.

More about analyzed vs non-analyzed fields here

Sai Gummaluri
  • 1,340
  • 9
  • 16