I am doing some research into distributed databases. I have concluded that the hardest part is distributed queries, over multiple records, where the query is quite generic e.g. imagine a transaction table with a "DateTransaction" field. This query (pseudo-SQL) would be quite tricky:
SELECT * FROM Transactions
WHERE DateTransaction>= '2016-01-01' and DateTransaction<= '2017-09-01' --between jan 2016 and before sep 2017
ORDER BY DateTransaction DESC
OFFSET 0
FETCH 100 -- take records between 0 and 99 inclusive
Hand rolling a somewhat efficient solution for a query like this; where the data is split over N nodes; isn't difficult. It's be a bit more difficult with a large OFFSET, but likely pagination would be done by adjusting the date range anyways.
But anyways - I want to avoid hand rolling anything! I am looking for a third-party solution (ideally free) that would let me efficiently do queries like the one in that example over several nodes/machines. Atomicity isn't necessary. Anyone know where to get started?