1

I have a cassandra database with id and jsonData fields.I am often fetching the value of this particular field in the jsonData.

Is it better if I create a new column for that field in the database in terms of time performance? If so, what's the difference between the two methods?

Thank you!

IceTea
  • 598
  • 1
  • 6
  • 19

1 Answers1

2

Assuming your structure is

CREATE TABLE abc.test ( id UUID PRIMARY KEY,  json map<int,text> );

You have a field called new_column in the json which is getting called again and again and you want to change it to

CREATE TABLE abc.test ( id UUID PRIMARY KEY, new_column int, json map<int,text> );

There are both advantages and disadvantages to the approach.

Advantage:

  • There are a lot of limitations with collections which you can avoid by using columns directly. Some of them are answered here.

  • I am assuming you will not need all the data every time on the map which is inefficient as Cassandra will retrieve the collection as a whole, so you will get all the data even if not needed.

  • You can also use new_column as clustering key so you can add filters on it, not sure if it is required in your case or not. You can always not specify the new_column and retrieve all data for the id.
  • Clearly defined schema which makes it easier to understand the system.

Disadvantage:

  1. May be you will have new_column as a sparse column But that should be fine as most of the Big Data system were created to handle this sparse data map problem.
  2. Data migration.

I will highly recommend adding new_column as a separate column.

Abhishek Garg
  • 2,158
  • 1
  • 16
  • 30