Questions tagged [partitioning]

Partitioning is a performance strategy whereby you divide possibly very large groups of data into some number of smaller groups of data.

The expectation is that with algorithms of order exponentially greater than N the total time it takes to process the smaller groups and combine the results is still less than the time it would take to process the one larger set of data.

Partitioning is similar to range partitioning in many ways. As in partitioning by RANGE, each partition must be explicitly defined.

3138 questions

votes

1 answer

When should I repartition an RDD?

I know that I can repartition an RDD to increase its partitions and use coalesce to decrease its partitions. I have two questions regarding this that I cannot completely understand after reading different resources. Spark will use a sensible default…

apache-spark rdd partitioning

asked Aug 18 '17 at 03:43

Marcos

votes

2 answers

Oracle automatic partitioning by day

I'm working with an Oracle 11g DB that has an input of 3-5m rows a day. In the future I would like to use partitioning based on the column Timestamp. My goal is to create a new partition for every day, automatically. I just found ways to create a…

oracle oracle11g oracle-sqldeveloper partitioning

asked Oct 10 '13 at 14:48

user2428207

votes

2 answers

"Deinterlacing" a list in Scala

I have a list of bytes that represent raw samples read in from an audio interface. Depending on the use case and H/W, each sample can be anywhere from 1 to 4 bytes long, and the total number of channels in the "stream" can be more or less arbitrary.…

list scala partitioning

asked Nov 03 '11 at 09:13

tomek

votes

10 answers

history rows management in database

As in many databases, i am designing a database that should keep record of previous versions of the rows changed in each table. The standard solution to this problem is to keep a history table for each data table, and whenever a row needs to be…

database oracle database-design partitioning

asked Apr 03 '09 at 21:31

Asaf Z

votes

2 answers

What is a good size (# of rows) to partition a table to really benefit?

I.E. if we have got a table with 4 million rows. Which has got a STATUS field that can assume the following value: TO_WORK, BLOCKED or WORKED_CORRECTLY. Would you partition on a field which will change just one time (most of times from to_work to…

sql oracle partitioning database-partitioning

asked Jul 30 '11 at 20:56

Revious

7,816
31
98
147

votes

2 answers

What are the benefits of vertical partitioning VS horizontal partitioning?

I simply cannot understand when or in what situation will we ever choose vertical partitioning instead of horizontal partitioning. What are the benefits of vertical partitioning VS horizontal partitioning? Are there any examples of websites /…

mysql sql-server database partitioning sharding

asked Jul 12 '11 at 17:38

totsum

votes

3 answers

Why shouldn't I give all my DynamoDB items in the same partition key value?

There are plenty of resources that recommend using high-cardinality attributes as partition keys. My question is, what will happen if I instead do the exact opposite of this and give all of my items the same partition key value (differentiating only…

amazon-web-services amazon-dynamodb primary-key partitioning

asked Nov 20 '20 at 23:17

Ryan Hilbert

1,805
1
18
31

votes

2 answers

JDBC to Spark Dataframe - How to ensure even partitioning?

I am new to Spark, and am working on creating a DataFrame from a Postgres database table via JDBC, using spark.read.jdbc. I am a bit confused about the partitioning options, in particular partitionColumn, lowerBound, upperBound, and…

apache-spark jdbc apache-spark-sql partitioning

asked Jun 10 '19 at 22:17

JoeMjr2

3,804
4
34
62

votes

3 answers

MySQL 5.1 Partitioning

I have the following example table... mysql> CREATE TABLE part_date3 -> ( c1 int default NULL, -> c2 varchar(30) default NULL, -> c3 date default NULL) engine=myisam -> partition by range (to_days(c3)) -> (PARTITION…

mysql partitioning

asked Apr 01 '11 at 21:12

Lee Armstrong

11,420
15
74
122

votes

2 answers

How does Round Robin partitioning in Spark work?

I've trouble to understand Round Robin Partitioning in Spark. Consider the following exampl. I split a Seq of size 3 into 3 partitions: val df = Seq(0,1,2).toDF().repartition(3) df.explain == Physical Plan == Exchange RoundRobinPartitioning(3) +-…

scala apache-spark partitioning

asked Jan 10 '19 at 07:37

Raphael Roth

26,751
15
88
145

votes

1 answer

How does range partitioner work in Spark?

I'm not so clear about how range partitioner works in Spark. It uses (Reservoir Sampling) to take samples. And I was confused by the way of computing the boundaries of the input. // This is the sample size we need to have roughly balanced output…

apache-spark partitioning

asked Jan 08 '17 at 15:35

American curl

1,259
2
18
21

votes

1 answer

Hive: where + in does not use partition?

I am querying a large table that is partitioned on a field called day. If I run a query: select * from my_table where day in ('2016-04-01', '2016-03-01') I get many mappers and reducers and the query takes a long time to run. If, however, I write…

hive partitioning

asked Apr 27 '16 at 16:59

cerpintaxt

votes

2 answers

How to create a PostgreSQL partitioned sequence?

Is there a simple (ie. non-hacky) and race-condition free way to create a partitioned sequence in PostgreSQL. Example: Using a normal sequence in Issue: | Project_ID | Issue | | 1 | 1 | | 1 | 2 | | 2 | 3 | | 2 …

postgresql sequence partitioning

asked Aug 28 '10 at 15:26

FooBar

votes

4 answers

Postgres partition by week

I can imagine table partition by a date (in particular for logs) is something widely used, but I am not able to find a good answer to my problem. I want to create a table partition by week (the number of records is to big to make it monthly). The…

sql postgresql partitioning week-number

asked Apr 17 '13 at 00:20

RGPT

votes

1 answer

Dynamic MySQL partitioning based on UnixTime

My DB design includes multiple MYISAM tables with measurements collected online, Each row record contains auto-incremented id, some data and an integer representing unixtime. I am designing an aging mechanism, and i am interested to use MySQL…

mysql dynamic partitioning myisam unix-timestamp

asked Dec 12 '12 at 11:53

Michael

2,827
4
30
47

Prev 1 2 3

…

99 100 Next