12

I've been spending a whole week on this with no hope of solving it. I am following this (quite old) article on e-commerce search and faceted filtering, etc., and it's working good so far (the search results are great and the aggregations work great when the filters are applied IN the query. I am using ElasticSearch 6.1.1.

But because I want to allow my users to perform multiple selections on the facets, I've moved the filters into the post_filter section. This still works well, it's filtering the results correctly and shows the aggregation counts for the whole document set accurately.

After reading this question on StackOverflow, I've realised that I have to perform some crazy acrobatics with 'filtered' aggregations alongside 'special' aggregations to reciprocally prune the aggregations in order to show correct counts AND allow multiple filters with them at the same time. I've asked for some clarification on that question but no response yet (it is an old question).

The problem I've been struggling with for so long is to get a set of filtered aggregations on nested fields where ALL the facets are filtered with all the filters.

My plan is to use the general aggregations (unfiltered) and keep the selected facet aggregations unfiltered (so that I can select multiple entries) but filter all the OTHER aggregations with the currently selected facets, so that I can only display the filters I can still apply.

However, if I use THE SAME filter on the documents (which work fine), and put the filter in the filtered aggregations, they do not work as desired. The counts are all wrong. I am aware that aggregations are computed before the filters, which is why I am replicating the filters on the aggregations I want.

Here is my query:

  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "fields": [
              "search_data.full_text_boosted^7",
              "search_data.full_text^2"
            ],
            "type": "cross_fields",
            "analyzer": "full_text_search_analyzer",
            "query": "some book"
          }
        }
      ]
    }
  }

Nothing special here, it works great and returns relevant results.

And here is my filter (in post_filter):

"post_filter" : {
    "bool" : {
      "must" : [
      {
        "nested": {
          "path": "string_facets",
            "query": {
              "bool" : {
                "filter" : 
                [
                  { "term" : { "string_facets.facet_name" : "Cover colour" } },
                  { "terms" : { "string_facets.facet_value" : [ "Green" ] } }
                ]
              }
            }
          }
        }

      ]
    }
  }

Let me stress: this works FINE. I am seeing the correct results ( in this case '13' results are shown, all matching the correct field - 'Cover colour' = 'Green' ).

Here are my general (unfiltered aggregations), that return all of the facets with correct count for all the products:

    "agg_string_facets": {
  "nested": {
    "path": "string_facets"
  },
  "aggregations": {
      "facet_name": {
        "terms": {
          "field": "string_facets.facet_name"
        },
        "aggregations": {
          "facet_value": {
            "terms": {
              "field": "string_facets.facet_value"
            }
          }
        }
      }
  }
}

This too works perfectly! I am seeing all the aggregations with accurate facet counts for all documents matching my query.

Now, check this out: I am creating an aggregation for the same nested fields but filtered so that I can get the aggregations + facets that 'survive' my filter:

"agg_all_facets_filtered" : {

           "filter" : {
             "bool" : {
               "must" : [
                {
                   "nested": {
                     "path": "string_facets",
                     "query": {
                       "bool" : {
                         "filter" : [
                           { "term" : { "string_facets.facet_name" : "Cover colour" } },
                           { "terms" : { "string_facets.facet_value" : [ "Green" ] } }
                          ]
                       }
                    }
                  }
              }]
            }
        },
        "aggs" : {
         "agg_all_facets_filtered" : {
           "nested": { "path": "string_facets" },
           "aggregations": {
            "facet_name": {
              "terms": { "field": "string_facets.facet_name" },
              "aggregations": {
                    "facet_value": {
                      "terms": { "field": "string_facets.facet_value" }
                    }
                  }
                }
            }  
         }

       }

Please note the filter I am using on this aggregation is the same as the one that filters my results in the first place (in post).

But for some reason, the aggregations returned are all wrong, namely the facet counts. For example, in my search here, I am getting 13 results, but the aggregation returned from 'agg_all_facets_filtered' only has a count of: 'Cover colour' = 4.

{
  "key": "Cover colour",
  "doc_count": 4,
  "facet_value": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
        {
          "key": "Green",
          "doc_count": 4
        }
    ]
  }
}

After checking why 4, I noticed that 3 of the documents contain the facet 'Cover colour' twice: once for 'Green' and once for 'Some other colours'... so it seems my aggregations are only counting the entries that have that facet name TWICE - or have it in common with other documents. This is why I think my filter on the aggregation is wrong. I've done a lot of reading on the AND vs OR of the matching/filters, I tried with 'Filter', 'Should', etc. Nothing fixes this.

I am sorry this is was long question but:

HOW can I write the aggregation filter so that the facets returned have the correct counts, given the fact that my filter works perfectly on its own?

Thanks all very much.

UPDATE: Following request for example, here is my full query (please note the filters in post_filter as well as the same filter in the filtered aggregations):

{
  "size" : 0,
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "fields": [
              "search_data.full_text_boosted^7",
              "search_data.full_text^2"
            ],
            "type": "cross_fields",
            "analyzer": "full_text_search_analyzer",
            "query": "bible"
          }
        }
      ]
    }
  },

  "post_filter" : {

    "bool" : {
      "must" : [
      {
        "nested": {
          "path": "string_facets",
            "query": {
              "bool" : {
                "filter" : 
                [
                  { "term" : { "string_facets.facet_name" : "Cover colour" } },
                  { "terms" : { "string_facets.facet_value" : [ "Green" ] } }
                ]
              }
            }
          }
        }

      ]
    }

  },

  "aggregations": {

        "agg_string_facets": {
      "nested": {
        "path": "string_facets"
      },
      "aggregations": {
          "facet_name": {
            "terms": {
              "field": "string_facets.facet_name"
            },
            "aggregations": {
              "facet_value": {
                "terms": {
                  "field": "string_facets.facet_value"
                }
              }
            }
          }
      }
    },

    "agg_all_facets_filtered" : {

           "filter" : {
             "bool" : {
               "must" : [
                {
                   "nested": {
                     "path": "string_facets",
                     "query": {
                       "bool" : {
                         "filter" : [
                           { "term" : { "string_facets.facet_name" : "Cover colour" } },
                           { "terms" : { "string_facets.facet_value" : [ "Green" ] } }
                          ]
                       }
                    }
                  }
              }]
            }
        },
        "aggs" : {
         "agg_all_facets_filtered" : {
           "nested": { "path": "string_facets" },
           "aggregations": {
            "facet_name": {
              "terms": { "field": "string_facets.facet_name" },
              "aggregations": {
                    "facet_value": {
                      "terms": { "field": "string_facets.facet_value" }
                    }
                  }
                }
            }  
         }

       }


    }

  }
}

The returned results are correct (as far as documents go) and here is the aggregation (unfiltered, from the results, for 'agg_string_facets' - notice 'Green' shows 13 documents - which is correct):

{
            "key": "Cover colour",
            "doc_count": 483,
            "facet_value": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 111,
              "buckets": [
                {
                  "key": "Black",
                  "doc_count": 87
                },
                {
                  "key": "Brown",
                  "doc_count": 75
                },
                {
                  "key": "Blue",
                  "doc_count": 45
                },
                {
                  "key": "Burgundy",
                  "doc_count": 43
                },
                {
                  "key": "Pink",
                  "doc_count": 30
                },
                {
                  "key": "Teal",
                  "doc_count": 27
                },
                {
                  "key": "Tan",
                  "doc_count": 20
                },
                {
                  "key": "White",
                  "doc_count": 18
                },
                {
                  "key": "Chocolate",
                  "doc_count": 14
                },
                {
                  "key": "Green",
                  "doc_count": 13
                }
              ]
            }
          }

And here is the aggregation (filtered with the same filter, at the same time from 'agg_all_facets_filtered'), showing only 4 for 'Green':

{
              "key": "Cover colour",
              "doc_count": 4,
              "facet_value": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                  {
                    "key": "Green",
                    "doc_count": 4
                  }
                ]
              }
            }

UPDATE 2: Here are some sample documents returned by the query:

"hits": {
    "total": 13,
    "max_score": 17.478987,
    "hits": [
      {
        "_index": "redacted",
        "_type": "product",
        "_id": "33107",
        "_score": 17.478987,
        "_source": {
          "type": "product",
          "document_id": 33107,
          "search_data": {
            "full_text": "hcsb compact ultrathin bible mint green leathertouch  holman bible staff leather binding 9781433617751 ",
            "full_text_boosted": "HCSB Compact Ultrathin Bible Mint Green Leathertouch Holman Bible Staff "
          },
          "search_result_data": {
            "name": "HCSB Compact Ultrathin Bible, Mint Green Leathertouch (Leather)",
            "preview_image": "/images/products/medium/0.jpg",
            "url": "/Products/ViewOne.aspx?ProductId=33107"
          },
          "string_facets": [
            {
              "facet_name": "Binding",
              "facet_value": "Leather"
            },
            {
              "facet_name": "Bible size",
              "facet_value": "Compact"
            },
            {
              "facet_name": "Bible size",
              "facet_value": "Ultrathin"
            },
            {
              "facet_name": "Bible version",
              "facet_value": "HCSB"
            },
            {
              "facet_name": "Cover colour",
              "facet_value": "Green"
            }
          ]
        }
      },
      {
        "_index": "redacted",
        "_type": "product",
        "_id": "17240",
        "_score": 17.416323,
        "_source": {
          "type": "product",
          "document_id": 17240,
          "search_data": {
            "full_text": "kjv thinline bible compact  leather binding 9780310439189 ",
            "full_text_boosted": "KJV Thinline Bible Compact "
          },
          "search_result_data": {
            "name": "KJV Thinline Bible, Compact (Leather)",
            "preview_image": "/images/products/medium/17240.jpg",
            "url": "/Products/ViewOne.aspx?ProductId=17240"
          },
          "string_facets": [
            {
              "facet_name": "Binding",
              "facet_value": "Leather"
            },
            {
              "facet_name": "Bible size",
              "facet_value": "Compact"
            },
            {
              "facet_name": "Bible size",
              "facet_value": "Thinline"
            },
            {
              "facet_name": "Bible version",
              "facet_value": "KJV"
            },
            {
              "facet_name": "Cover colour",
              "facet_value": "Green"
            }
          ]
        }
      },
      {
        "_index": "redacted",
        "_type": "product",
        "_id": "17243",
        "_score": 17.416323,
        "_source": {
          "type": "product",
          "document_id": 17243,
          "search_data": {
            "full_text": "kjv busy mom's bible  leather binding 9780310439134 ",
            "full_text_boosted": "KJV Busy Mom'S Bible "
          },
          "search_result_data": {
            "name": "KJV Busy Mom's Bible (Leather)",
            "preview_image": "/images/products/medium/17243.jpg",
            "url": "/Products/ViewOne.aspx?ProductId=17243"
          },
          "string_facets": [
            {
              "facet_name": "Binding",
              "facet_value": "Leather"
            },
            {
              "facet_name": "Bible size",
              "facet_value": "Pocket"
            },
            {
              "facet_name": "Bible size",
              "facet_value": "Thinline"
            },
            {
              "facet_name": "Bible version",
              "facet_value": "KJV"
            },
            {
              "facet_name": "Cover colour",
              "facet_value": "Pink"
            },
            {
              "facet_name": "Cover colour",
              "facet_value": "Green"
            }
          ]
        }
      },
      {
        "_index": "redacted",
        "_type": "product",
        "_id": "33030",
        "_score": 15.674053,
        "_source": {
          "type": "product",
          "document_id": 33030,
          "search_data": {
            "full_text": "apologetics study bible for students grass green leathertou  mcdowell sean; holman bible s leather binding 9781433617720 ",
            "full_text_boosted": "Apologetics Study Bible For Students Grass Green Leathertou Mcdowell Sean; Holman Bible S"
          },
          "search_result_data": {
            "name": "Apologetics Study Bible For Students, Grass Green Leathertou (Leather)",
            "preview_image": "/images/products/medium/33030.jpg",
            "url": "/Products/ViewOne.aspx?ProductId=33030"
          },
          "string_facets": [
            {
              "facet_name": "Binding",
              "facet_value": "Leather"
            },
            {
              "facet_name": "Bible designation",
              "facet_value": "Study Bible"
            },
            {
              "facet_name": "Bible designation",
              "facet_value": "Students"
            },
            {
              "facet_name": "Bible feature",
              "facet_value": "Indexed"
            },
            {
              "facet_name": "Cover colour",
              "facet_value": "Green"
            }
          ]
        }
      },
      {
        "_index": "redacted",
        "_type": "product",
        "_id": "33497",
        "_score": 15.674053,
        "_source": {
          "type": "product",
          "document_id": 33497,
          "search_data": {
            "full_text": "hcsb life essentials study bible brown / green  getz gene a.; holman bible st imitation leather 9781586400446 ",
            "full_text_boosted": "HCSB Life Essentials Study Bible Brown  Green Getz Gene A ; Holman Bible St"
          },
          "search_result_data": {
            "name": "HCSB Life Essentials Study Bible Brown / Green (Imitation Leather)",
            "preview_image": "/images/products/medium/33497.jpg",
            "url": "/Products/ViewOne.aspx?ProductId=33497"
          },
          "string_facets": [
            {
              "facet_name": "Binding",
              "facet_value": "Imitation Leather"
            },
            {
              "facet_name": "Bible designation",
              "facet_value": "Study Bible"
            },
            {
              "facet_name": "Bible version",
              "facet_value": "HCSB"
            },
            {
              "facet_name": "Binding",
              "facet_value": "Imitation leather"
            },
            {
              "facet_name": "Cover colour",
              "facet_value": "Brown"
            },
            {
              "facet_name": "Cover colour",
              "facet_value": "Green"
            }
          ]
        }
      }
}
Cristian Cotovan
  • 1,090
  • 1
  • 13
  • 23
  • Can you explain more on current result vs expected result using example documents. – Nishant Dec 31 '18 at 16:30
  • Hi, I've updated the question with examples. Thanks! – Cristian Cotovan Jan 02 '19 at 09:04
  • What I was expecting you to add was a few documents of those 13 documents. – Nishant Jan 02 '19 at 09:32
  • Apologies, I though you wanted to see samples of aggregations. I've added a few documents returned. – Cristian Cotovan Jan 02 '19 at 14:22
  • 2
    I created a sample data using the five documents you added to your question. I added one more doc which do not match the post filter. And in the nested aggregation I got correct result which is `{"key":"Cover colour","doc_count":7,"facet_value":{"doc_count_error_upper_bound":0,"sum_other_doc_count":0,"buckets":[{"key":"Green","doc_count":5},{"key":"Brown","doc_count":1},{"key":"Pink","doc_count":1}]}}` – Nishant Jan 03 '19 at 04:10
  • Hmm, and you used my query? Could it be that there's a bug in the version of ES I am using? – Cristian Cotovan Jan 03 '19 at 09:10
  • Yes I used your query. Regarding bug, I doubt that in a stable release there is a bug for this. I tried this on 6.4.0 – Nishant Jan 03 '19 at 10:12
  • Thanks @NishantSaini, I've decided to install ES 6.5 and reindex the data - it seems to work correctly now! I can't believe I've wasted 3 weeks almost, on a bug! Thanks for your help! – Cristian Cotovan Jan 03 '19 at 10:25
  • At the end issue solved :) – Nishant Jan 03 '19 at 10:31

1 Answers1

4

Mystery solved! Thanks for your input, it turns out that the version I was using (6.1.1) has a bug. I don't know what the bug exactly is, but I've installed ElasticSearch 6.5, reindexed my data and with no changes to the queries or mappings, it all works as it should!

Now, I don't know if I should submit a bug report to ES, or just leave it, seeing as it's an older version and they've moved on.

Cristian Cotovan
  • 1,090
  • 1
  • 13
  • 23
  • I am working on same problem and I went through the stack overflow question you mentioned and relevant medium and other articles linked with them, and you comment on the posing this question (as an article again), and comment in StackOverflow. Glad to see you found the answer to this problem and showed me a way to start work on my problem – Satyaaditya Oct 15 '20 at 13:19
  • How did you receive information about the currently selected facet, I inspected requests of some e-commerce sites and no request shows currently selected facet but sending all the selected facets. How can we identify which facet needs to be persisted without filters – Satyaaditya Oct 15 '20 at 13:22
  • Not sure if I understand your question. You run the filters in post_filter -- that still restricts the results to your selected facets but also gives you BACK all the facets. You need to persist your selected ones yourself. – Cristian Cotovan Oct 16 '20 at 14:06