0

TokuMx though has benefits, we are running into issues. Recently we migrated to this engine and in process our clean up scripts are useless. We have transient data that we used clean every night and then reclaim disk via db.repairDatabase . However that command is not supported by TokuMX and as a result we are not able to reclaim the disk.

Is there an alternate way ?

Stennie
  • 63,885
  • 14
  • 149
  • 175
purvesh
  • 13
  • 3
  • Write your transient data to a separate db and then reclaim disk space by just dropping the db. – Sergio Tulentsev Nov 26 '14 at 13:24
  • Thanks sergio ..., it still has risk of inflight transactions.. what happens to that when db is dropped ? ... but your suggestion made me dig deeper at tokumx's partitioned collection. I might have to change underlying java app but creating partitioned collection with time as index of partition might be better option. I can drop partition of previous day. What do you think ? – purvesh Nov 26 '14 at 13:43
  • I'm not familiar with tokumx's partitioned collections – Sergio Tulentsev Nov 26 '14 at 13:43

1 Answers1

0

It sounds like partitioned collections are the right abstraction for your application. Normal collections will suffer from the accumulation of MVCC garbage if you have a pattern of deleting large swaths of old data. With partitioned collections, you can drop a partition and reclaim all the space instantaneously.

leif
  • 1,987
  • 4
  • 19
  • 22
  • Thanks a lot leif ... This is what I am leaning towards .. will have to modify our underlying framework that is creating this transient data to put ts which can be used for partition. – purvesh Nov 29 '14 at 19:12
  • You can partition on objectid, where the most significant bits are already a timestamp. This should work fairly well without any app changes. – leif Dec 01 '14 at 18:35