I am trying to test geomesa cassandra backend.
I have ingested a ~2M points from OSM and send DWITHIN
and BBOX
queries to cassandra using geomesa with geotools ecql.
Then I've done some performance tests, the results do not look reasonable for me.
Cassandra is installed to linux machine with 16 cores xeon, 32GB RAM and 1 SSD drive. I got ~150
queries per second.
I started to investigate geomesa execution plan for my queries.
Trace logs coming from org.locationtech.geomesa.index.utils.Explainer
were really helpful, they do a great job explaining what is going on.
What looks confusing to me is the number of range scans that go though cassandra.
For example, I see the following in my logs:
Table: osm_poi_a7_c_osm_5fpoi_5fa7_attr_v2
Ranges (49): SELECT * FROM ..
The number 49
means the actual number of range scans sent to cassandra.
Different queries give me different results, they vary approximately from ~10 to ~130.
10
looks quite reasonable to me but 130
looks enormous.
Could you please explain what causing geomesa to send so huge amount of range scans?
Is there any way to decrease the number of range scans?
Maybe there are some configuration options?
Are there other options? like descreasing the presicion of z-index to improve such queries?
Thanks anyway!