I need to build a customer 360 degree database, which requires:
- A wide-column table, each customer is one row, with lots of columns (says > 1000)
- We have ~20 batch update analytics jobs running daily. Each analytics job queries and updates a small set of columns, for all the rows. It includes aggregating the data for reporting, and loading /saving the data for machine learning algorithms.
- We update customers' info in several columns, with <= 1 million rows per day. The update workload is spread out across working hours. We have more than 200 million rows.
For these requirements, I think an modifiable columnar DB would be a perfect fit: it can be queried and aggregated by columns which is optimal for analytics, it can be updated for several million changes throughout the day. The most identical project I have found is Apache Kudu, but its limitation of 300 columns is a big turn-off, we have more than 1000.
And we prefer a open-source project.
Any suggestions ?