2

I'm seeing some curious effects when running fuzzy queries inside bool queries in filter vs. query context. I'm on Elasticsearch 6.0.0.

I have an index whose documents have a field firstName. If I run the following, for example:

{
  "query": {
    "fuzzy": {
      "firstName": {
        "value": "yvonne",
        "fuzziness": 1
      }
    }
  }
}

I get 5596 hits. Now if I stick the fuzzy term inside a bool must clause:

{
  "query": {
    "bool": {
      "must": [
        {
          "fuzzy": {
            "firstName": {
              "value": "yvonne",
              "fuzziness": 1
            }
          }
        }
      ]
    }
  }
}

I still get 5596. And if I change the must to a filter clause:

{
  "query": {
    "bool": {
      "filter": [
        {
          "fuzzy": {
            "firstName": {
              "value": "yvonne",
              "fuzziness": 1
            }
          }
        }
      ]
    }
  }
}

Same, 5596 again. Unsurprising, right?

Let's change fuzziness to 2 instead of 1. Running the simple fuzzy term query again:

{
  "query": {
    "fuzzy": {
      "firstName": {
        "value": "yvonne",
        "fuzziness": 2
      }
    }
  }
}

Now I get 6079 hits. Larger edit distance should match more documents, seems reasonable. Now I'll stick that inside a bool query as a must clause again:

{
  "query": {
    "bool": {
      "must": [
        {
          "fuzzy": {
            "firstName": {
              "value": "yvonne",
              "fuzziness": 2
            }
          }
        }
      ]
    }
  }
}

Still 6079. Now change the must clause to a filter:

{
  "query": {
    "bool": {
      "filter": [
        {
          "fuzzy": {
            "firstName": {
              "value": "yvonne",
              "fuzziness": 2
            }
          }
        }
      ]
    }
  }
}

This returns 7980 hits.

As I understand it, the sole difference between must and filter clauses in a bool query is whether hits are scored or not. But this doesn't seem to be true; running the fuzzy query in a filter context seems to be making the query less selective. What am I missing? What could be causing this?

web
  • 192
  • 2
  • 12

0 Answers0