I have a 40 column RDBMS table which I am porting to Cassandra.
Using the estimator at http://docs.datastax.com/en/cassandra/2.1/cassandra/planning/architecturePlanningUserData_t.html
I created a excel sheet with column names, data types, size of each column etc. The Cassandra specific overhead for each RDBMS row is a whopping 1KB when the actual data is only 192 bytes.
Since the overheads are proportional to number of columns, I thought it would be much better if I just create a UDT for the fields that are not part of the primary key. That way, I would incur the column overhead only once.
Also, I don't intend to run queries on inner fields of the UDT. Even if I did want that, Cassandra has very limited querying features that work on non PK fields.
Is this a good strategy to adopt? Are there any pitfalls? Are all these overheads easily eliminated by compression or some other internal operation?