0

I have following documents indexed in elastic search

{
                "_index": "ecommerce",
                "_type": "products",
                "_id": "12895",
                "_score": 1,
                "_source": {
                    "title": "Blue Armani Jeans",
                    "slug": "blue-armani-jeans",
                    "price": 200,
                    "sale_price": 0,
                    "vendor_id": 62,
                    "featured": 0,
                    "viewed": 0,
                    "stock": 1,
                    "sku": "arm-jeans",
                    "brand": "",
                    "rating": 0,
                    "active": 0,
                    "vendor_name": "Armani",
                    "category": [
                        "Men Fashion",
                        "Casual Wear"
                    ],
                    "image": "armani-jeans.jpg",
                    "variations": [
                        {
                            "variation_id": "32",
                            "stock": 10,
                            "price": 199,
                            "variation_image": "",
                            "sku": "arm-jeans-11",
                            "Size": "38",
                            "Color": "Blue"
                        },
{
                            "variation_id": "33",
                            "stock": 10,
                            "price": 199,
                            "variation_image": "",
                            "sku": "arm-jeans-12",
                            "Size": "40",
                            "Color": "Blue"
                        }
                    ]
                }
            },

And i am using a query which gets all the filter variations to be shown with aggregation.

Query:

{
    "size": 0,
    "aggs": {
        "variations": {
            "nested": {
                "path": "variations"
            },
            "aggs": {
                "size": {
                    "terms": {
                        "field": "variations.Size"
                    }
                },
                "color": {
                    "terms": {
                        "field": "variations.Color"
                    }
                },
                "brand": {
                    "reverse_nested": {},
                    "aggs": {
                        "brand": {
                            "value_count": {
                                "field": "brand"
                            }
                        }
                    }
                }
            }
        }
    }
}

Output :

"color": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 543,
                "buckets": [
                    {
                        "key": "black",
                        "doc_count": 298
                    },
                    {
                        "key": "blue",
                        "doc_count": 227
                    },
                    {
                        "key": "brown",
                        "doc_count": 170
                    },
                    {
                        "key": "white",
                        "doc_count": 153
                    },
                    {
                        "key": "pink",
                        "doc_count": 127
                    },
                    {
                        "key": "grey",
                        "doc_count": 120
                    },
                    {
                        "key": "multi",
                        "doc_count": 99
                    },
                    {
                        "key": "red",
                        "doc_count": 89
                    },
                    {
                        "key": "color",
                        "doc_count": 81
                    },
                    {
                        "key": "green",
                        "doc_count": 76
                    }
                ]
            },
            "brand": {
                "doc_count": 621,
                "brand": {
                    "value": 6
                }
            },
            "size": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 517,
                "buckets": [
                    {
                        "key": "size",
                        "doc_count": 195
                    },
                    {
                        "key": "s",
                        "doc_count": 158
                    },
                    {
                        "key": "free",
                        "doc_count": 156
                    },
                    {
                        "key": "m",
                        "doc_count": 140
                    },
                    {
                        "key": "l",
                        "doc_count": 134
                    },
                    {
                        "key": "xl",
                        "doc_count": 102
                    },
                    {
                        "key": "9",
                        "doc_count": 69
                    },
                    {
                        "key": "8",
                        "doc_count": 68
                    },
                    {
                        "key": "10",
                        "doc_count": 67
                    },
                    {
                        "key": "11",
                        "doc_count": 61
                    }
                ]
            }

The records are fine if they dont have any spaces but for variations like "free size" it splits them up into "free" and "size".

What can i do to treat them as a single variation parameter? Or is there any specialized query for this kind of situation?

Omer Farooq
  • 3,754
  • 6
  • 31
  • 60
  • This is most likely a mapping issue. You need to map the field as `keyword` (or if you also need an analyzed field, add another field mapped keyword). Can you add your mapping to the question? – dshockley Aug 26 '17 at 08:02
  • If you see the size results you will see a 2 values. 1. free 2.size. This was actually a single word called "free size" but now due to – Omer Farooq Aug 26 '17 at 08:57
  • Can you please show your mapping and the document containing "free size" as it was indexed? – dshockley Aug 26 '17 at 10:42
  • Did you explicitly set a mapping when you created the index? – dshockley Aug 26 '17 at 10:47
  • Sorry, obviously you did, or you wouldn't have any nested field. Check what the type is for variations.Size. – dshockley Aug 26 '17 at 11:04

1 Answers1

0

The problem is that your mapping is most likely something like this:

...
"variations": {
  "properties": {
    "Size": {
      "type": "text",
      "analyzer": "standard"
    ...

This is a bit over-simplified, but when Elasticsearch indexes documents, it first analyzes them, and splits them into tokens, and modifies the tokens so that they will work best for search, and then stores in the index counts of how many of each token appear in each document. For example, if you have a text that says "Dogs are awesome," and someone searches "dog", you want to match that text, because it's about dogs. Elasticsearch is super-powerful and useful for a variety of purposes, but its first purpose is natural language text search. So by default, that's what it prepares for. You need to explicitly tell it, if you want some other behavior (with your mapping).

When you do a terms aggregation, it would be incredibly inefficient to go through every single document's raw text, instead of just using the index that's already created which conveniently has counts of terms per document. If you have standard analyzed text, "terms" in this case means "free" and "size", not "free size". If you want to index the full field as a term, you can use the "keyword" type instead of "text" type:

...
"variations": {
  "properties": {
    "Size": {
      "type": "keyword"
    ...

If you haven't set any mapping for this field explicitly, the default in ES 5 actually already has a keyword-mapped field:

...
"variations": {
  "properties": {
    "Size": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
    ...

What this means is that if you don't have any size values longer than 256 characters, you can simply update your aggregation to look like this:

        ...
        "aggs": {
            "size": {
                "terms": {
                    "field": "variations.Size.keyword"
                }
            },
        ...

However, unless you are actually using the analyzed field, I would suggest to re-index your documents with the Size field mapped keyword.

dshockley
  • 1,494
  • 10
  • 13