Should I create a new column for a field I need to read often? Cassandra

Question

I have a cassandra database with id and jsonData fields.I am often fetching the value of this particular field in the jsonData.

Is it better if I create a new column for that field in the database in terms of time performance? If so, what's the difference between the two methods?

Thank you!

score 2 · Answer 1 · answered Jul 26 '19 at 04:07

Assuming your structure is

CREATE TABLE abc.test ( id UUID PRIMARY KEY,  json map<int,text> );

You have a field called new_column in the json which is getting called again and again and you want to change it to

CREATE TABLE abc.test ( id UUID PRIMARY KEY, new_column int, json map<int,text> );

There are both advantages and disadvantages to the approach.

Advantage:

There are a lot of limitations with collections which you can avoid by using columns directly. Some of them are answered here.
I am assuming you will not need all the data every time on the map which is inefficient as Cassandra will retrieve the collection as a whole, so you will get all the data even if not needed.
You can also use new_column as clustering key so you can add filters on it, not sure if it is required in your case or not. You can always not specify the new_column and retrieve all data for the id.
Clearly defined schema which makes it easier to understand the system.

Disadvantage:

May be you will have new_column as a sparse column But that should be fine as most of the Big Data system were created to handle this sparse data map problem.
Data migration.

I will highly recommend adding new_column as a separate column.

Should I create a new column for a field I need to read often? Cassandra

1 Answers1