Data partitioning deals with the dividing of a collection of data into smaller collections of data for the purpose of faster processing, easier statistics gathering and smaller memory/persistence footprint.
Questions tagged [data-partitioning]
337 questions
5
votes
3 answers
Flink Custom Partition Function
I am using Scala on Flink with DataSet API.
I want to re-partition my data across the nodes. Spark has a function that lets the user to re-partition the data with a given numberOfPartitions parameter (link) and I believe Flink does not support such…

Batuhan Tüter
- 301
- 1
- 3
- 14
5
votes
2 answers
How to decide a good partition key for Azure Cosmos DB
I'm new to Azure Cosmos DB, but I want to have a vivid understanding of:
What is the partition key?
My understanding is shallow for now -> items with the same partition key will go to the same partition for storage, which could better load…

Sherry629629
- 105
- 2
- 6
5
votes
1 answer
Maximum Coin Partition
Since standing at the point of sale in the supermarket yesterday, once more trying to heuristically find an optimal partition of my coins while trying to ignore the impatient and nervous queue behind me, I've been pondering about the underlying…

Treecj
- 427
- 2
- 19
5
votes
1 answer
Best Practices: to partition eventhub data & achieve high-scale, low-latency and high-throughput via azure eventhubs to external store (azure blobs)
As part of a security product I have high scale cloud service (azure worker role) that reads events from event hub, batches them to ~2000 and stores in blob storage.
Each event has a MachineId (the machine that sent it).
Events are coming from the…

Zorik
- 207
- 1
- 3
- 9
5
votes
2 answers
Partition line into equal parts
This is a geometry question.
I have a line between two points A and B and want separate it into k equal parts. I need the coordinates of the points that partition the line between A and B.
Any help is highly appreciated.
Thanks a lot!

sebp
- 260
- 2
- 8
5
votes
2 answers
Replace duplicate values in array with new randomly generated values
I have below a function (from a previous question that went unanswered) that creates an array with n amount of values. The sum of the array is equal to $max.
function randomDistinctPartition($n, $max) {
$partition= array();
for ($i = 1; $i < $n;…

Russell Dias
- 70,980
- 5
- 54
- 71
5
votes
1 answer
How can I fit a curve to a histogram distribution?
Someone asked me a question via e-mail about integer partitions the other day (as I had released a Perl module, Integer::Partition, to generate them), that I was unable to answer.
Background: here are all the integer partitions of 7 (the sum of each…

dland
- 4,319
- 6
- 36
- 60
5
votes
1 answer
Date range queries in Azure Table storage
Hello following on from my question: Windows Azure table access latency Partition keys and row keys selection about the way I have organised data in my Azure storage account. I have a table storage scheme designed to store info about entities.
There…

Captain John
- 1,859
- 2
- 16
- 30
5
votes
2 answers
Haskell - Match Type Instance
I have defined a Haskell type similar to the following:
data TypeData = TypeA Int | TypeB String | TypeC Char deriving (Eq, Show)
At some point, I need a way to filter a [TypeData] for all non-TypeC instances. The signature of the function I am…

Tanaki
- 2,575
- 6
- 30
- 41
4
votes
4 answers
Algorithm to partition a list into groups
I have a list of names.
I want to partition this list into groups of a specified size. All groups should be equal to or less than the specified size, with an equal as possible group size across the groups, and as close to the specified size as…

Keith
- 168
- 1
- 11
4
votes
1 answer
How to read filtered partitioned parquet files efficiently using pandas's read_parquet?
Let say my data stored in object storage, say s3, with date time partition like this:
s3://my-bucket/year=2021/month=01/day=03/SOME-HASH-VAL1.parquet
...
s3://my-bucket/year=2022/month=12/day=31/SOME-HASH-VAL1000.parquet
According to pandas's…

user3595632
- 5,380
- 10
- 55
- 111
4
votes
1 answer
Cormen quicksort
In the book Introduction to Algorithms, the quicksort algorithm described in the chapter Quicksort does not employ Hoare-Partitioning.
Can anyone enlighten me with the advantage of this approach over the popular hoare-partitioning. Or is it that its…

S..K
- 1,944
- 2
- 14
- 16
4
votes
0 answers
Oracle Partition Range by Date Precision
My understanding is that there is no fractional second for a Date data type. If that is true, then why do the three queries below not all have Pstart=6 and why do they not all have no filter predicate?
That is, for a Date data type is it true that…

Alex Bartsmon
- 471
- 4
- 9
4
votes
1 answer
Keyby data distribution in Apache Flink, Logical or Physical Operator?
According to the Apache Flink documentation, KeyBy transformation logically partitions a stream into disjoint partitions. All records with the same key are assigned to the same partition.
Is KeyBy 100% logical transformation? Doesn't it include…

shaikh
- 582
- 6
- 24
4
votes
2 answers
C algorithm for Partition issues
Given a set of integers S:
How can the set be divided into k parts such that the sum of each part is minimal?
Please give also a C implementation.
Example:
S = {1, 2, 3, 4, 5, 6} and k = 3
The partition
S1 = {1, 6}
S2 = {2, 5}
S3 = {3, 4}
has…

edgarmtze
- 24,683
- 80
- 235
- 386