Data partitioning deals with the dividing of a collection of data into smaller collections of data for the purpose of faster processing, easier statistics gathering and smaller memory/persistence footprint.
Questions tagged [data-partitioning]
337 questions
1
vote
1 answer
How does sofs:partitions in Erlang work?
Note: This question is based on rethinking of my previous similar question.
I would like to know if Erlang's sofs:partition does the same thing which is described in Wikipedia's page about Set partitions.
If it does, how can I get the following…

skanatek
- 5,133
- 3
- 47
- 75
1
vote
1 answer
How do I generate set partitions of a certain size?
I would like to generate partitions for a set in a specific way: I need to filter out all partitions which are not of size N in the process of generating these partitions. The general solution is "Generate all “unique” subsets of a set (not a…

skanatek
- 5,133
- 3
- 47
- 75
1
vote
1 answer
Counting integer partitions for which the xor is zero
I'm looking for an efficient way to compute the number of partitions of integer for which the xor is zero:
F(n,c) = #{ (x1,x2, ... ,xc) | x1 + x2 + ... + xc = n & x1 xor x2 xor ... xor xc = 0 }
For little values of n and c, it's easy to run nested…

user1059422
- 31
- 3
1
vote
1 answer
How can I create an index based on values from another column in SQL?
For example if this is my table -
SeqNo Gap
20 Start
21 End
29 Start
30 End
42 Start
43 End
49 Start
50 Start
51 Start
52 Start
53 Start
54 Start
55 End
220 Start
221 Start
222 End
I want the based on Start and end output…

Sabreen Sageer
- 11
- 2
1
vote
1 answer
A quick way to return all the dates in a table from a database partitioned by DATE and SYMBOL
I have a table in a database partitioned by DATE and SYMBOL, and the DATE column is of the TIMESTAMP type.
Is there any faster way to return all the dates in the table than the statements select distinct(date(datetime)) from t and select count(*)…

Eva Gao
- 402
- 1
- 7
1
vote
1 answer
How to find out the type of partitioning in a table in google bigquery using python apis
def partition(dataset1, dataset2):
try:
client.get_dataset(dataset2)
print("Dataset {} already exists".format(dataset2))
except NotFound:
print("Dataset {} not found".format(dataset2))
…

Max Daniel
- 27
- 2
1
vote
1 answer
Shard a collection in mongo atlas
Is it possible to Shard a collection in MongoDB atlas? I tried to Shard a collection but when going to enable sharding to my database it gave this error.
MongoServerError: (Unauthorized) not authorized on admin to execute command { enableSharding:…

LakshanAmal
- 13
- 4
1
vote
1 answer
AWS Athena: Partition projection using date-hour with mixed ranges
I am trying to create an Athena table using partition projection. I am delivering records to S3 using Kinesis Firehouse, grouped using a dynamic partitioning key. For example, the records look like the…

ash_m
- 31
- 3
1
vote
1 answer
PSQL determine the min value of date depending on another column
The input table looks like this:
ID
pid
method
date
111
A123
credit_card
12-03-2015
111
A128
ACH
11-28-2015
Now for the ID = 111, I need to select the MIN(date) and see what the method of payment for it is. I need the output table to…

moikoi
- 25
- 8
1
vote
0 answers
add generated column with aggregated over a partion and sort
I am trying to add a calculated column that computes a rolling average of a sorted partition. I can make it work as a query but cannot seem to get the result to become a calculated field.
ALTER TABLE PUBLIC "minutes"
ADD COLUMN "green_avg"…

Nuljon
- 11
- 1
1
vote
1 answer
How to generate a single file per partition - Snowflake COPY into location
I've managed to unload my data into a partitions, but each one of them is also being partitioned into multiple files. Is there a way to force Snowflake to generate a single file per partition?
It also would be great if I can zip all the files.
This…

Andres
- 13
- 3
1
vote
2 answers
Can I migrate a partitioned table to a non-partitioned table in Oracle with the CREATE TABLE statement?
I have an Oracle 11g partitioned table with 10 partitions for ten years of data, each on its own tablespace partitioned by range. Each year-partition contains 12 monthly-partitions.
I would like to convert this table to a non-partitioned table,…

LBS
- 518
- 8
- 17
1
vote
1 answer
Reading spark partitioned data from directories
My data is partitioned as Year,month,day in s3 Bucket. I have a requirement to read last six months of data everyday.I am using below code to read the data but it is selecting negative value in months. is there a way to read the correct data for…

code_bug
- 355
- 1
- 12
1
vote
3 answers
Partitioning by range columns unexpected behavior
I have MySQL table partitioned by range columns (c_id and created_at)
and I created 2 partitions:
logs_1_2020 (c_id less than 2 and created less than 2021-01-01 00:00:00)
logs_1_2021 (c_id less than 2 and created less than 2022-01-01…

es code
- 11
- 3
1
vote
1 answer
spark-cassandra-connector - repartitionByCassandraReplica returns empty RDD - Java
So, I have a 16 node cluster where every node has Spark and Cassandra installed while I am using the Spark-Cassandra Connector 3.0.0. I am trying to join a dataset with a cassandra table on the partition key, while also trying to use…

Des0lat0r
- 482
- 3
- 18