Questions tagged [partitioning]

Partitioning is a performance strategy whereby you divide possibly very large groups of data into some number of smaller groups of data.

Partitioning is a performance strategy whereby you divide possibly very large groups of data into some number of smaller groups of data.

The expectation is that with algorithms of order exponentially greater than N the total time it takes to process the smaller groups and combine the results is still less than the time it would take to process the one larger set of data.

Partitioning is similar to range partitioning in many ways. As in partitioning by RANGE, each partition must be explicitly defined.

3138 questions
10
votes
1 answer

When should I repartition an RDD?

I know that I can repartition an RDD to increase its partitions and use coalesce to decrease its partitions. I have two questions regarding this that I cannot completely understand after reading different resources. Spark will use a sensible default…
Marcos
  • 701
  • 1
  • 8
  • 25
10
votes
2 answers

Oracle automatic partitioning by day

I'm working with an Oracle 11g DB that has an input of 3-5m rows a day. In the future I would like to use partitioning based on the column Timestamp. My goal is to create a new partition for every day, automatically. I just found ways to create a…
user2428207
  • 825
  • 4
  • 16
  • 29
9
votes
2 answers

"Deinterlacing" a list in Scala

I have a list of bytes that represent raw samples read in from an audio interface. Depending on the use case and H/W, each sample can be anywhere from 1 to 4 bytes long, and the total number of channels in the "stream" can be more or less arbitrary.…
tomek
  • 117
  • 6
9
votes
10 answers

history rows management in database

As in many databases, i am designing a database that should keep record of previous versions of the rows changed in each table. The standard solution to this problem is to keep a history table for each data table, and whenever a row needs to be…
Asaf Z
9
votes
2 answers

What is a good size (# of rows) to partition a table to really benefit?

I.E. if we have got a table with 4 million rows. Which has got a STATUS field that can assume the following value: TO_WORK, BLOCKED or WORKED_CORRECTLY. Would you partition on a field which will change just one time (most of times from to_work to…
Revious
  • 7,816
  • 31
  • 98
  • 147
9
votes
2 answers

What are the benefits of vertical partitioning VS horizontal partitioning?

I simply cannot understand when or in what situation will we ever choose vertical partitioning instead of horizontal partitioning. What are the benefits of vertical partitioning VS horizontal partitioning? Are there any examples of websites /…
totsum
  • 301
  • 4
  • 8
9
votes
3 answers

Why shouldn't I give all my DynamoDB items in the same partition key value?

There are plenty of resources that recommend using high-cardinality attributes as partition keys. My question is, what will happen if I instead do the exact opposite of this and give all of my items the same partition key value (differentiating only…
9
votes
2 answers

JDBC to Spark Dataframe - How to ensure even partitioning?

I am new to Spark, and am working on creating a DataFrame from a Postgres database table via JDBC, using spark.read.jdbc. I am a bit confused about the partitioning options, in particular partitionColumn, lowerBound, upperBound, and…
JoeMjr2
  • 3,804
  • 4
  • 34
  • 62
9
votes
3 answers

MySQL 5.1 Partitioning

I have the following example table... mysql> CREATE TABLE part_date3 -> ( c1 int default NULL, -> c2 varchar(30) default NULL, -> c3 date default NULL) engine=myisam -> partition by range (to_days(c3)) -> (PARTITION…
Lee Armstrong
  • 11,420
  • 15
  • 74
  • 122
9
votes
2 answers

How does Round Robin partitioning in Spark work?

I've trouble to understand Round Robin Partitioning in Spark. Consider the following exampl. I split a Seq of size 3 into 3 partitions: val df = Seq(0,1,2).toDF().repartition(3) df.explain == Physical Plan == Exchange RoundRobinPartitioning(3) +-…
Raphael Roth
  • 26,751
  • 15
  • 88
  • 145
9
votes
1 answer

How does range partitioner work in Spark?

I'm not so clear about how range partitioner works in Spark. It uses (Reservoir Sampling) to take samples. And I was confused by the way of computing the boundaries of the input. // This is the sample size we need to have roughly balanced output…
American curl
  • 1,259
  • 2
  • 18
  • 21
9
votes
1 answer

Hive: where + in does not use partition?

I am querying a large table that is partitioned on a field called day. If I run a query: select * from my_table where day in ('2016-04-01', '2016-03-01') I get many mappers and reducers and the query takes a long time to run. If, however, I write…
cerpintaxt
  • 256
  • 1
  • 2
  • 13
9
votes
2 answers

How to create a PostgreSQL partitioned sequence?

Is there a simple (ie. non-hacky) and race-condition free way to create a partitioned sequence in PostgreSQL. Example: Using a normal sequence in Issue: | Project_ID | Issue | | 1 | 1 | | 1 | 2 | | 2 | 3 | | 2 …
FooBar
  • 191
  • 1
  • 3
9
votes
4 answers

Postgres partition by week

I can imagine table partition by a date (in particular for logs) is something widely used, but I am not able to find a good answer to my problem. I want to create a table partition by week (the number of records is to big to make it monthly). The…
RGPT
  • 564
  • 1
  • 7
  • 16
9
votes
1 answer

Dynamic MySQL partitioning based on UnixTime

My DB design includes multiple MYISAM tables with measurements collected online, Each row record contains auto-incremented id, some data and an integer representing unixtime. I am designing an aging mechanism, and i am interested to use MySQL…
Michael
  • 2,827
  • 4
  • 30
  • 47