0

I understand that when a record is deleted in Cassandra the value is not actually deleted but marked as a tombstone so that consistency can be achieved among other nodes.

In my production setup, I am receiving data from 1000's of sensors and I am pushing this data to Azure service bus, internet is not very stable in production so I have a cassandra node to store the data that was not yet sent to Azure,

Once I send to Azure I delete the data in Cassandra records that I have successfully sent, I have set the GC_Grace_Seconds table to 0 and I never in the future plan to add another node to this cluster (always a single node Cassandra).

Will this cause me any issues in the future with this plan..? will the performance of this table reduce..? Will It affect any other tables that I may want to create in this node..?

Vishweshwar Kapse
  • 921
  • 6
  • 23
  • 43
  • 2
    one node cassandra doesn't make a sense from my point of view. if you always will have one node, why not just write data into some local database, like, RocksDB/LevelDB/... ? – Alex Ott Apr 24 '20 at 10:41
  • I believe that the speed at which reads and writes happen are very fast irrespective of the size of the data set. I have not evaluated Other DBs, To be honest I am using cassandra because a major part of code to read, write connect etc.. I have re-used from a previous project. – Vishweshwar Kapse Apr 24 '20 at 10:46
  • 2
    I agree with @AlexOtt on the one node point. Postgres could probably do what you want it to, without having to engineer around Cassandra's delete nuances. – Aaron Apr 24 '20 at 13:40
  • I am already Knee deep in this implementation, I don't want to change it unless There is a strong need for it @AlexOtt – Vishweshwar Kapse Apr 27 '20 at 04:50

1 Answers1

0

For this specific use case when you had a plan to proceed with single node cassandra cluster it won't have any such impact. So whenever the data got deleted from table with gc_grace_seconds=0, it will be considered as deleted and not marked as tombstone. Other thing, because it is just a single node cassandra cluster so no need to worry about any peers node to share the deletion updates.

But the other raising question is choice of cassandra database as a single node cluster for this use case, preferred is Postgres can easily satisfied your need.

andy
  • 525
  • 3
  • 6
  • 22
  • I had populated the table with over a million records and then deleted them with a delete query after which the table became un-responsive for simple select statements. I have decided not to use Cassandra for this use case as it is causing more problems than solving them – Vishweshwar Kapse May 06 '20 at 15:18