Data partitioning deals with the dividing of a collection of data into smaller collections of data for the purpose of faster processing, easier statistics gathering and smaller memory/persistence footprint.
Questions tagged [data-partitioning]
337 questions
0
votes
0 answers
Explicit conversion of column in Table Partition in SQL Server
I have table like below:
CREATE TABLE [dbo].[PartitionExample]
(
[dateTimeColumn1] [datetime] NOT NULL,
CONSTRAINT [PK_PartitionExample] PRIMARY KEY CLUSTERED
(
[dateTimeColumn1] ASC
)WITH (PAD_INDEX = OFF,…

Murali Dhar Darshan
- 285
- 1
- 17
0
votes
1 answer
Date difference over consecutive rows filtered to one instance of consecutive values using LEAD function
Basically the result i need is price change ranges for each item, i need to extract an item price and the date of the transaction with an end date set to the next time that item price is changed.
Given this table
create table myTable(id int, Price…

mhDuke
- 137
- 1
- 10
0
votes
0 answers
Nested partition in postgres
I am attempting to create referral code infrastructure. I would like to use upto the first 12 characters of a user's email, deduped with an integer:
So given three users with the emails bob@gmail.com, bob@hotmail.com, and bob@yahoo.com, I would like…

Abraham P
- 15,029
- 13
- 58
- 126
0
votes
1 answer
In Spark, when no partitioner is specified, does the ReduceByKey operation repartition the data by hash before starting aggregating it?
If we do not mention any partitioner for a reduceByKey operation, does it perform hashPartitioning internally before the reduction? For example my test code is like:
val rdd = sc.parallelize(Seq((5, 1), (10, 2), (15, 3), (5, 4), (5, 1), (5,3),…

Sayantan Ghosh
- 998
- 2
- 9
- 29
0
votes
0 answers
Partitioning Huge SQL Database for data management
I have a DB with 170GB+ data , 1 table contains 90% of db data. we are planning to do a sliding window partition on the ChangeLog table . What would be the best solution to manage the data , so that i can remove old data with least amount of down…

drac13
- 112
- 9
0
votes
1 answer
recover the value above in postgres
I want to recover the value above by name.
See the table.
I would like to have a result like in the column before last number_week.
Thank you

Bak
- 411
- 1
- 5
- 19
0
votes
0 answers
How to define partitions to Dataframe in pyspark?
Suppose I read a parquet file as a Dataframe in pyspark, how can I specify how many partitions it must be?
I read the parquet file like this -
df = sqlContext.read.format('parquet').load('/path/to/file')
How may I specify the number of partitions…

Ani Menon
- 27,209
- 16
- 105
- 126
0
votes
1 answer
Kafka Streams - How to efficiently join with a large, non-copartitioned store/topic
We have a stream of web events.
The event is partitioned by (domain, uid).
All events explained here are from same domain. There are thousands of domains, very uneven in traffic (hence that partitioning).
Let's say we have events from one…

xmar
- 1,729
- 20
- 48
0
votes
1 answer
Can hive metastore virtually partition data based on column value without physically changing the directory structure?
As an example consider I have a data of all the major sports events happened.Schema given below
EventName,Date,Month,Year,City
This data that is physically structured in HDFS on year,date,month.
Now I want to create virtual partitions on that based…

anmolp95
- 11
- 2
0
votes
0 answers
Partition by most recent data - Sql Server 2012
how can I partition my table into two. One with latest one month data and the rest into another partition? I am using SQL Server 2012 Standard Edition.

Pandiarajan
- 410
- 5
- 14
0
votes
1 answer
Kafka Streams: Partial reprocessing by key
Scenario:
In a KafkaStreams web sessioning scenario,
with unlimited (or years-long) retention,
with interactive queries (this can be reviewed if necessary),
with many clients, which have many users each (each user particular to each client),
and…

xmar
- 1,729
- 20
- 48
0
votes
1 answer
MySQL query to create a rank chart over time
I have a table with statistic data (roughly 100,000 rows) that has the following format:
WeekNum Name Value Category
201751 Joe 15 X
201751 Max 23 X
201751 Jim 7 X
201752 Joe 18 X
201752 …

Matth
- 159
- 1
- 3
- 10
0
votes
0 answers
Apache Spark - write only the changed partitions
Is there a built-in way to define a DataFrame as a set of partition paths (each with one or more files), use that DataFrame as the basis of a set of so-called "mutation" queries which are defined as a separate DataFrame, and partition the resulting…

jennykwan
- 2,631
- 1
- 22
- 33
0
votes
2 answers
suitable data structure for set (graph) partition
I need to store data grouping nodes of a graph partition, something like:
[node1, node2] [node3] [node4, node5, node6]
My first idea was to have just a simple vector or array of ints, where the position in the array denoted the node_id and it's…

zenna
- 9,006
- 12
- 73
- 101
0
votes
1 answer
How to check on which column to create Index to optimize performance
I have below query which is costing too much time and i have to optimize the query performance. There is no index on any of the table.
But now for query performance optimization i am thinking to create index. But not sure on particulary which…

Andrew
- 3,632
- 24
- 64
- 113