I am looking for a NoSQL technology that meets the requirement of being able to process geospatial as well as time queries on a large scale with decent performance. I want to batch-process several hundred of GBs to TBs of data with the proposed NoSQL technology along with Spark. This will obviously be run on a cluster with several nodes.
Types of queries I want to run:
- "normal" queries for attributes like "field <= value"
- Basic geospatial queries like querying all data that relies within a bbox.
- Time queries like "date <= 01.01.2011" or "time >= 11:00 and time <= 14:00"
- a combination of all of the three query types (something like "query all data that where location is within bbox and on date 01.01.2011 and time <= 14:00 and field_x <= 100")
I am currently evaluating which technologies are possible for my usecase but I'm overwhelmed by the sheer amount of technologies there are available. I have thought about popular technologies like MongoDB and Cassandra. Both seem to be applicable for my usecase (Cassandra only with Stratios Lucene index) but there might be a different technology that works even better.
Is there any technology that will heavily outperform others based on these requirements?