Questions tagged [bucketing]

7 questions
1
vote
0 answers

bucketing values in python

I want to split my values associating them with hash between buckets. As HASH I am using embedded hash Python function which range is -abs(sys.maxsize) to sys.maxsize. I have created a function to list buckets by values. I don't understand why I…
Jonito
  • 407
  • 7
  • 18
0
votes
0 answers

Bucketed joins in PySpark/Iceberg

I'm trying to perform a join between two tables in PySpark using the iceberg format. I'm trying to use bucketing to improve performance, and avoid a shuffle, but it appears to be having no effect whatsoever. What might I be missing? Code for…
0
votes
0 answers

How to create an AWS Athena Table with Partition Projection and Bucketing enabled?

I am trying to Create an Athena Table that makes use of both Projected Partitioning and Bucketing (CLUSTERED BY). I'm doing this to get a side by side performance comparison for our dataset with and without using Bucketing. Through my tests, this…
0
votes
0 answers

Filter Elasticsearch documents by a sub-value

I am keeping java call-stacks information in Elasticsearch. Each callstack element represents one method in the callstack, and is stored in a separate document in Elasticsearch. Each method has a unique method_id (unique long value across all…
0
votes
0 answers

How to do bucketing in Databricks?

We are migrating a job from onprem to databricks. We are trying to optimize the jobs but couldn't use bucketing because by default databricks stores all tables as delta table and it shows error that bucketing is not supported for delta. We tried to…
0
votes
1 answer

Can I increase number of buckets after table creation in hive?

In hive, once the table is created with n buckets. Is their any way to increase number of buckets?
0
votes
0 answers

Why doesn't bucketing work at partition level?

I have this scenario where my each of my individual partitions needs to be bucketed at a different level. I have tried the following scenarios but it doesn't work. Created a bucketed table and created a partition (date_id='2022-10-22') and set the…
Vinay Kumar
  • 1,664
  • 2
  • 15
  • 19