I have an Elastic Search 5.2 cluster with 16 nodes (13 data nodes/3 master/24 GB RAM/12 GB Heap). I am performance testing a query and making 50 calls of a search query per second on the Elastic cluster. My query looks like the following -
{
"query": {
"bool": {
"must": [
{
"term": {
"cust_id": "AC-90-464690064500"
}
},
{
"range": {
"yy_mo_no": {
"gt": 201701,
"lte": 201710
}
}
}
]
}
}
}
My index mapping is like the following -
cust_id Keyword
smry_amt Long
yy_mo_no Integer // doc_values enabled
mkt_id Keyword
. . .
. . .
currency_cd Keyword // Total 10 field with 8 Keyword type
The index contains 200 million records and for each cust_id, there may be 100s of records. Index has 2 Replicas. The record size is under 100 bytes.
When I run the performance test for 10 minutes, the query response and performance seems to be very slow. Upon investigating a bit more in details in Kibana monitoring tab, It appears that there is a lot of Garbage Collection activity happening (pls. see Image below) -
I have several question clouding in my mind. I did some research on Range queries but didn't find much on what can cause GC activity in scenarios similar to mine. I also research on Memory usage and GC activity, but most of Elastic documentation refers that young generation GC is normal while Indexing, while search activity mostly use the file system cache that OS maintains. Thats why, in the chart above, Heap is not much used since Search was using File System cache.
So -
- What might be causing the garbage collection to happen here ?
- The chart shows that the Heap is still available to Elastic Search, and Used Heap is still very less as compared to available. Then what is triggering GC ?
- Is the query type causing any internal data structure to be created that is getting disposed off, causing GC ?
- The CPU spike may be due to GC activity.
- Is there any other efficient way of running the Range query in Elastic Search pre 5.5 versions ?
- Profiling the query tells that Elastic is running a TermQuery and a BooleanQuery with the later is costing the most.
Any idea whats going on here ?
Thanks in Advance,
- SGSI.