Highest Voted 'data-compaction' Questions

1

vote

2 answers

Directory size increased after compaction using pyspark

I wrote a file compactor using pyspark. The way that it works is by reading all the content of a directory into a spark dataframe and then performing a repartition action in order to reduce the number of files. The number of desired files is…

asked Jul 05 '23 at 07:44

Liran Eliyahu

11
4

1

vote

0 answers

Using multiple TTL values in Cassandra table

What are the disadvantages of using multiple TTL values(One in table level and another for specific data rows to override the TTL for those rows) in Cassandra table.Will it result into incomplete data cleanup? As in TWCS being used,we may never get…

cassandra ttl data-compaction

asked Nov 01 '22 at 16:01

Cassandra Thrift

13
2

0

votes

2 answers

Data in hive table is changed after running a compaction in pyspark

Following previously asked question adding link. in short: I wrote a file compactor in spark, the way that it works is by reading all files under a directory into a dataframe, performing coalesce over the dataframe (by the number of wanted files),…

apache-spark hadoop pyspark hive data-compaction

asked Jul 16 '23 at 13:30

Liran Eliyahu

11
4

0

votes

0 answers

Can I run compaction in multiple graph spaces in the NebulaGraph database?

I'm running Nebula Graph database on AWS with the Twitter dataset (3 graph spaces), and each space has a data volume of around 500GB. I know that the compaction process is quite time-consuming. Can I run compaction for all 3 graph spaces at the same…

nebula-graph data-compaction

asked Dec 28 '22 at 08:07

randomv

218
1
7

0

votes

0 answers

Kafka - consuming messages from topic while removing duplicates

I'm going to consume a Kafka topic with log.cleanup.policy=compact. There will be many consumers/producers that concurrently will read/write the topic. I want to be sure that the consumers while reading messages from the topic all the duplicates,…

apache-kafka data-compaction

asked Oct 25 '22 at 16:39

freedev

25,946
8
108
125

0

votes

1 answer

Kafka - changing log.cleanup.policy to existing topic

I have a Kafka topic that receives many many messages. Many of them have the same key and I'm interested only in the latest messages. Looking around this topic seems perfect for the config log.cleanup.policy=compact. Can I change the existing Kafka…

apache-kafka data-compaction

asked Oct 25 '22 at 16:18

freedev

25,946
8
108
125

0

votes

1 answer

Does etcd's storage footprint grow linearly with respect to keys and values?

I noticed that, when running some stress tests on a Kubernetes cluster, etcd snapshot sizes didnt increase much, even as I added more and more stuff to my cluster. I collected snapshots via: etcdctl --endpoints="https://localhost:2379"…

etcd data-compaction

asked Oct 24 '22 at 18:14

jayunit100

17,388
22
92
167

0

votes

1 answer

rocksdb all compaction jobs done notification

I use rocksdb's bulk loading mechanism to load a bunch of sst files generated by offline spark tasks. In order to avoid a large number of disk IO during the loading and compacting process from affecting online read requests, I want to finish offline…

rocksdb bulk-load data-compaction

asked May 19 '22 at 14:11

user2260241

95
5

0

votes

1 answer

CouchDB 3.2 disable auto compaction for a specific database

How can I disable auto compaction in couchdb 3.2? I want to preserve all the history for a specific database. Or completely disable auto compaction. note) couchdb(3.2) configuration has changed from 2.0

couchdb data-compaction

asked May 07 '22 at 15:43

Zeta

913
10
24

0

votes

1 answer

How to free disk space from Cassandra when a lot of tombstones have collected in sizeTieredCompaction strategy

I am running cqlsh version 5.0.1, having a 6 node cluster, where recently I have done a major data cleanup in a table that uses sizeTieredCompaction strategy in order to free some disk space but that didn't happen, the issue that I am facing is that…

cassandra tombstone data-compaction

asked Jan 03 '22 at 06:02

Yash Tandon

345
5
18

0

votes

1 answer

hbase: For major compaction config does not take effect

I have do the config ：habse.offpeak.end.hour:22 ,hbase.offpeak.start.hour: 18.hbase.hregion.majorcompaction: 86400000.but hbase still do major compaction in random time ,like:9:00 ,13:55 and so on. can you tell me how to config hbase do major…

hbase data-compaction

asked Nov 26 '21 at 08:06

haiwangch

1

0

votes

1 answer

How to remove old revisions of the documents in a couchdb database?

I have a very large database with some GB of data. And when I try to compact it's taking me more than 12 hours. Is there any other way to delete old revisions? Does the _revs_limit help in this. I can see that the revs limit of all databases is set…

nosql storage couchdb data-compaction

asked Oct 12 '21 at 07:23

Rahib Rasheed

317
1
10

-1

votes

1 answer

Which compaction strategy is recommended for a table with minimal updates

I am looking for compaction strategy for the data which has following characteristics We don't need the data after 60-90 days. At extreme scenarios maybe 180 days. Ideally insert happens and updates never happens but it is realistic to expect…

cassandra nosql cql spring-data-cassandra data-compaction

asked Jul 08 '21 at 09:18

vineeth kanaparthi

2,355
2
10
6

Questions tagged [data-compaction]