Before I ask my question, a little background about our problem.
We are developing a government revenue projection application. At this application, we collect data about previous revenue, apply several econometric and political projection models (through several phases) while persisting it to the database.
In short, this is a simplified model for the projection system:
Several lines (at start, ~6.000 lines) with this schema represents past revenue (called scenario):
+------+------+------+------+------+-------+---------+
| Cat1 | Cat2 | Cat3 | Cat4 | Year | Month | Revenue |
+------+------+------+------+------+-------+---------+
Throughout the projection system, the data is transformed in several ways (moving categories around, changing revenue values, fixing atypicality etc.). Each transformation is done in a phase:
Initial Scenario (S0) ---(1st transformation phase)--> Transformed Scenario (S1)
S1 ---(2nd t.p.)--> S2 ---> S3 ---...---> SN
Each phase transforms a piece of the scenario, this piece size varying from 2-100% of the data, and the partial scenario states must be persisted until its final state (SN) is achieved. The partial state can be only the transformed lines or the entire partial scenario (as long as it is possible to compare partial states).
Also, the user can go back into the process (say, back to phase 2) and restart the projection from there, ignoring the work that was previously done.
With this use case we faced a problem with RDBMS: they are really slow for write operations (taking as long as half an hour to do a scenario projection).
After reading about NoSQL DBMS, we arrived at several options, but, as I'm still on the learning curve, I'd like to ask: what's the best choice to use on this use case: VoltDB, Redis, Riak, Cassandra, MongoDB or HBase?
Thanks in advance.