1

I'm working on an application that is similar to some shopping cart, where we store product and its metadata (JSON) and we are expecting faster search results. (Expected Search results should contain documents having search string anywhere in product JSON doc)

We have chosen ElasticSearch (AWS service) to store the complete product JSONs. we though it would be helpful for our faster search results.

But when I tried to test my search endpoint, it is taking 2sec+ for single request, and it keep on increasing upto 30sec if I make 100 parallel requests using Jmeter. (these query times are from the application logs, not from Jmeter responses.)

Here is the sample product JSON and sample search string I'm storing in ElasticSearch.

I believe we are using ES in wrong way, please help us implementing it in a right way.

Product JSON:

 {
  "dealerId": "D320",
  "modified": 1562827907,
  "store": "S1000",
  "productId": "12345689",
  "Items": [
    {

      "Manufacturer": "ABC",
      "CODE": "V22222",
      "category": "Electronics",
      "itemKey": "b40a0e332190ec470",
      "created": 1562828756,
      "createdBy": "admin",
      "metadata": {
        "mfdDate": 1552828756,
        "expiry": 1572828756,
        "description": "any description goes here.. ",
        "dealerName": "KrishnaKanth Sing, Bhopal"
      }
    }
  ]
}

Search String:

krishna

UPDATE: We receive daily stock with multiple products (separate JSONs with different productIds) and we are storing them in date-wise index's (eg. products_20190715).

While searching we are searing on products_* indices.

We are using JestClient library to communicate with ES from our SpringBoot application.

Sample Search query:

    {
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "must": [
              {
                "simple_query_string": {
                  "query": "krishna*",
                  "flags": -1,
                  "default_operator": "or",
                  "lenient": true,
                  "analyze_wildcard": false,
                  "all_fields": true,
                  "boost": 1
                }
              }
            ],
            "disable_coord": false,
            "adjust_pure_negative": true,
            "boost": 1
          }
        }
      ],
      "filter": [
        {
          "bool": {
            "must": [
              {
                "bool": {
                  "should": [
                    {
                      "match_phrase": {
                        "category": {
                          "query": "Electronics",
                          "slop": 0,
                          "boost": 1
                        }
                      }
                    },
                    {
                      "match_phrase": {
                        "category": {
                          "query": "Furniture",
                          "slop": 0,
                          "boost": 1
                        }
                      }
                    },
                    {
                      "match_phrase": {
                        "category": {
                          "query": "Sports",
                          "slop": 0,
                          "boost": 1
                        }
                      }
                    }
                  ],
                  "disable_coord": false,
                  "adjust_pure_negative": true,
                  "boost": 1
                }
              }
            ],
            "disable_coord": false,
            "adjust_pure_negative": true,
            "boost": 1
          }
        },
        {
          "bool": {
            "disable_coord": false,
            "adjust_pure_negative": true,
            "boost": 1
          }
        }
      ],
      "disable_coord": false,
      "adjust_pure_negative": true,
      "boost": 1
    }
  },
  "sort": [
    {
      "modified": {
        "order": "desc"
      }
    }
  ]
}
Venkat Papana
  • 4,757
  • 13
  • 52
  • 74

2 Answers2

2

There are several issues with your elasticsearch query.

  1. Storing each day products in the different index is your design choice, which I am not aware of but if its a small list of products then it doesn't make sense and can cause the performance issue, as now these products will be stored in different smaller shards, which increases your search time, instead of searching them in a single shard, obviously if data is too large then having a single shard will also hurt performance, but that analysis you need to do and design your system accordingly and we can help you in that.

  2. Now lets come to your query, first, you are using the wild card query which is anyway slow please read this post where the founder of Elasticsearch itself commented :-) and there is solution also provided to use the n-grams tokens instead of wildcard query, which we also used in our production to search for partial terms.

  3. The third issue with your query is that you are using "all_fields": true, in your search query which will include all the fields in your index during the search which is quite a costly things to do and you should include only the relevant fields in your search.

I am sure even if you don't change the first one(design change) but incorporate the 2 other changes in your query, it will still improve your query performance a lot.

Happy debugging and learning.

Amit
  • 30,756
  • 6
  • 57
  • 88
  • Thanks Amit, 1) reason for choosing daily based index: you are correct, today we have few hundreds of products per for our store but we want to use same instance for multiple stores in future 2) I will try n-grams 3) my client is expecting all results having search string anywhere in product JSON doc – Venkat Papana Jul 17 '19 at 05:52
  • 1
    @VenkatPapana thanks for the clarification, the search string is fine(in general full text search work like this), but what you trying to implement here is substring search which is very costly search. your customer wants `substring` search in all the words in your product search? please confirm as it doesn't make sense to do substring search in all the fields. like id fields, date fields etc, it is ok for doing this for brand name, tile etc, – Amit Jul 17 '19 at 06:36
  • You are correct Amit, substring search is required for specific fields like product code, id (they maintain internal custom codes; used in communication btw dealer and stores) also for few other fields like brand as you said. I just removed `*` wild char from my search string, and I'm seeing lot of improvement (30-50%) in query runtime. – Venkat Papana Jul 17 '19 at 08:41
  • Sure @Amit, can you let me know how to specify substring search for few fields and regular word search for remaining fields? – Venkat Papana Jul 17 '19 at 09:20
  • @VenkatPapana , you can individually specify the fields in the query string like its specified in example https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html and then you need to use the boolean query to combine the multiple sub queries – Amit Jul 17 '19 at 09:22
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/196557/discussion-between-amit-khandelwal-and-venkat-papana). – Amit Jul 17 '19 at 09:23
-1

Use Post processor JSON extractor and fetch the patter of data you need to input as search string. enter image description here

Give JSON expression and match number as 0 to take the pattern in random and 1 for the first data and 2nd for 2nd so on. Hence, you have made the search string dynamic. This will replicate the real scenario since each user will not be searching for the same string.

When you put more sequential/concurrent users over the server, it is normal that the time to get response from each requests increases gradually. But what you need to concern is about the failures from the server and the average time taken for the requests in summary report.

In general, as a standard, the requests should not take more than 10 seconds to respond.(depends upon companies and type of products). Please note that the default timeout of Jmeter is around 21 seconds.If the requests time goes beyond this, it automatically gets failed(if "Delay thread creation until needed" is disabled in thread group). But you can assert the expected value in the advanced tab in each request in Jmeter.

Arjun Dev
  • 406
  • 5
  • 16
  • Thanks for the response Arjun, but I'm not looking for writing Jmeter tests and calculating the response metrics; I want to understand why my ES search queries are taking more time and way to optimize it. – Venkat Papana Jul 16 '19 at 10:04
  • Please make your question more clear. I think you does not want an answer for how to load test it rather how to fix it or how to optimize it. – Arjun Dev Jul 16 '19 at 10:16
  • Sure. Thanks Arjun – Venkat Papana Jul 16 '19 at 10:24