Im planning a few ETLs that eventually will "fill" the same row in Cassandra, i.e. if a table is defined as:
CREATE TABLE MyTable (
key text,
column1 text,
column2 text,
column3 text,
column4 text,
PRIMARY KEY (key)
)
Than few ETLs will fill out the appropriate values in the columns 1-4 in a different times.
How well cassandra handles such operations? Should I read the row first, update in code and then write back or would an UPDATE call will do the trick?
I know that Cassandra is highly optimized for write throughput in that it never modifies data on disk, it only appends to existing files or creates new ones. knowing that, and without diving deeper into the implementation, It worries me that if an ETL will write column4 and 20 minutes later a different ETL will write column2, I will lose a lot of performance comparing to waiting for all the ETLs to finish and than save all the data in a bulk (which is not an easy implementation by itself).
Ideas?