Questions tagged [data-compaction]
13 questions
1
vote
2 answers
Directory size increased after compaction using pyspark
I wrote a file compactor using pyspark.
The way that it works is by reading all the content of a directory into a spark dataframe and then performing a repartition action in order to reduce the number of files.
The number of desired files is…

Liran Eliyahu
- 11
- 4
1
vote
0 answers
Using multiple TTL values in Cassandra table
What are the disadvantages of using multiple TTL values(One in table level and another for specific data rows to override the TTL for those rows) in Cassandra table.Will it result into incomplete data cleanup?
As in TWCS being used,we may never get…

Cassandra Thrift
- 13
- 2
0
votes
2 answers
Data in hive table is changed after running a compaction in pyspark
Following previously asked question adding link.
in short:
I wrote a file compactor in spark, the way that it works is by reading all files under a directory into a dataframe, performing coalesce over the dataframe (by the number of wanted files),…

Liran Eliyahu
- 11
- 4
0
votes
0 answers
Can I run compaction in multiple graph spaces in the NebulaGraph database?
I'm running Nebula Graph database on AWS with the Twitter dataset (3 graph spaces), and each space has a data volume of around 500GB.
I know that the compaction process is quite time-consuming. Can I run compaction for all 3 graph spaces at the same…

randomv
- 218
- 1
- 7
0
votes
0 answers
Kafka - consuming messages from topic while removing duplicates
I'm going to consume a Kafka topic with log.cleanup.policy=compact.
There will be many consumers/producers that concurrently will read/write the topic.
I want to be sure that the consumers while reading messages from the topic all the duplicates,…

freedev
- 25,946
- 8
- 108
- 125
0
votes
1 answer
Kafka - changing log.cleanup.policy to existing topic
I have a Kafka topic that receives many many messages. Many of them have the same key and I'm interested only in the latest messages. Looking around this topic seems perfect for the config log.cleanup.policy=compact.
Can I change the existing Kafka…

freedev
- 25,946
- 8
- 108
- 125
0
votes
1 answer
Does etcd's storage footprint grow linearly with respect to keys and values?
I noticed that, when running some stress tests on a Kubernetes cluster, etcd snapshot sizes didnt increase much, even as I added more and more stuff to my cluster.
I collected snapshots via:
etcdctl --endpoints="https://localhost:2379"…

jayunit100
- 17,388
- 22
- 92
- 167
0
votes
1 answer
rocksdb all compaction jobs done notification
I use rocksdb's bulk loading mechanism to load a bunch of sst files generated by offline spark tasks. In order to avoid a large number of disk IO during the loading and compacting process from affecting online read requests, I want to finish offline…

user2260241
- 95
- 5
0
votes
1 answer
CouchDB 3.2 disable auto compaction for a specific database
How can I disable auto compaction in couchdb 3.2?
I want to preserve all the history for a specific database.
Or completely disable auto compaction.
note) couchdb(3.2) configuration has changed from 2.0

Zeta
- 913
- 10
- 24
0
votes
1 answer
How to free disk space from Cassandra when a lot of tombstones have collected in sizeTieredCompaction strategy
I am running cqlsh version 5.0.1, having a 6 node cluster, where recently I have done a major data cleanup in a table that uses sizeTieredCompaction strategy in order to free some disk space but that didn't happen, the issue that I am facing is that…

Yash Tandon
- 345
- 5
- 18
0
votes
1 answer
hbase: For major compaction config does not take effect
I have do the config :habse.offpeak.end.hour:22 ,hbase.offpeak.start.hour: 18.hbase.hregion.majorcompaction: 86400000.but hbase still do major compaction in random time ,like:9:00 ,13:55 and so on.
can you tell me how to config hbase do major…
0
votes
1 answer
How to remove old revisions of the documents in a couchdb database?
I have a very large database with some GB of data. And when I try to compact it's taking me more than 12 hours. Is there any other way to delete old revisions? Does the _revs_limit help in this. I can see that the revs limit of all databases is set…

Rahib Rasheed
- 317
- 1
- 10
-1
votes
1 answer
Which compaction strategy is recommended for a table with minimal updates
I am looking for compaction strategy for the data which has following characteristics
We don't need the data after 60-90 days. At extreme scenarios maybe 180 days.
Ideally insert happens and updates never happens but it is realistic to expect…

vineeth kanaparthi
- 2,355
- 2
- 10
- 6