Questions tagged [partitioning]

Partitioning is a performance strategy whereby you divide possibly very large groups of data into some number of smaller groups of data.

The expectation is that with algorithms of order exponentially greater than N the total time it takes to process the smaller groups and combine the results is still less than the time it would take to process the one larger set of data.

Partitioning is similar to range partitioning in many ways. As in partitioning by RANGE, each partition must be explicitly defined.

3138 questions

votes

4 answers

Clustering, Sharding or simple Partition / Replication

We have created a Facebook application and it got a lot of virality. The problem is that our database started getting REALLY FULL (some tables have more than 25 million rows now). It got to the point that the app just stopped working because there…

asked Jan 04 '11 at 14:35

albertosh

2,416
7
25
32

votes

4 answers

How to get the number of elements in partition?

Is there any way to get the number of elements in a spark RDD partition, given the partition ID? Without scanning the entire partition. Something like this: Rdd.partitions().get(index).size() Except I don't see such an API for spark. Any ideas?…

apache-spark partitioning

asked Feb 24 '15 at 02:20

Geo

votes

2 answers

How to see table partition size in MySQL ( is it even possible? )

I've partitioned my table horizontally and I'd like to see how the rows are currently distributed. Searching the web didn't bring any relevant results. Could anyone tell me if this is possible?

mysql database database-design database-schema partitioning

asked Dec 31 '13 at 00:18

user3010273

votes

2 answers

Partition data for efficient joining for Spark dataframe/dataset

I need to join many DataFrames together based on some shared key columns. For a key-value RDD, one can specify a partitioner so that data points with same key are shuffled to same executor so joining is more efficient (if one has shuffle related…

apache-spark apache-spark-sql partitioning apache-spark-dataset

asked Jan 09 '18 at 02:22

Rainfield

1,172
2
14
29

votes

3 answers

Efficient querying of multi-partition Postgres table

I've just restructured my database to use partitioning in Postgres 8.2. Now I have a problem with query performance: SELECT * FROM my_table WHERE time_stamp >= '2010-02-10' and time_stamp < '2010-02-11' ORDER BY id DESC LIMIT 100; There are 45…

sql performance postgresql partitioning

asked Feb 10 '10 at 12:39

Adrian Pronk

13,486
7
36
60

votes

1 answer

Best way to manage row expiration in mysql

An application does the following: writes a row to a table that has a unique ID read the table and find the unique ID and output the other variables (among which the timestamp). The question is: the application needs to read only the non-expired…

mysql performance cron indexing partitioning

asked Dec 23 '13 at 13:24

smartcity

votes

8 answers

How to select rows from partition in MySQL

I made partition my 300MB table and trying to make select query from p0 partition with this command SELECT * FROM employees PARTITION (p0); But I am getting following error ERROR 1064 (42000): You have an error in your SQL syntax; check the manual…

mysql sql partitioning database-partitioning mysql-5.1

asked Jan 01 '13 at 16:53

Kad

votes

1 answer

Partitions and UPDATE

I'm diving deeper and deeper into MySQL Features, and the next one I'm trying out is table partitions There's basically only one question about them, where I couldn't find a clear answer yet: If you UPDATE a row, will the row be moved to another…

mysql performance partitioning

asked Oct 17 '12 at 07:54

Katai

2,773
3
31
45

votes

5 answers

Clojure partition by filter

In Scala, the partition method splits a sequence into two separate sequences -- those for which the predicate is true and those for which it is false: scala> List(1, 5, 2, 4, 6, 3, 7, 9, 0, 8).partition(_ % 2 == 0) res1: (List[Int], List[Int]) =…

clojure partitioning

asked Apr 14 '11 at 14:32

Ralph

31,584
38
145
282

votes

3 answers

How to optimize partitioning when migrating data from JDBC source?

I am trying to move data from a table in PostgreSQL table to a Hive table on HDFS. To do that, I came up with the following code: val conf = new…

apache-spark jdbc hive apache-spark-sql partitioning

asked Oct 02 '18 at 06:38

Metadata

2,127
9
56
127

votes

1 answer

Partitioning in spark while reading from RDBMS via JDBC

I am running spark in cluster mode and reading data from RDBMS via JDBC. As per Spark docs, these partitioning parameters describe how to partition the table when reading in parallel from multiple…

apache-spark jdbc apache-spark-sql partitioning

asked Mar 31 '17 at 22:42

Dev

13,492
19
81
174

votes

1 answer

Understanding shuffle managers in Spark

Let me help to clarify about shuffle in depth and how Spark uses shuffle managers. I report some very helpful…

apache-spark rdd partitioning shuffle

asked Jan 11 '17 at 08:09

Giorgio

1,073
3
15
33

votes

3 answers

spark parquet write gets slow as partitions grow

I have a spark streaming application that writes parquet data from stream. sqlContext.sql( """ |select |to_date(from_utc_timestamp(from_unixtime(at), 'US/Pacific')) as event_date, …

apache-spark partitioning parquet

asked Sep 16 '16 at 06:46

Gaurav Shah

5,223
7
43
71

votes

1 answer

How to Partition a Table by Month ("Both" YEAR & MONTH) and create monthly partitions automatically?

I'm trying to Partition a Table by both Year and Month. The Column through which I'll partition is a datetime type column with an ISO Format ('20150110', 20150202', etc). For example, I have sales data for 2010, 2011, 2012. I'd Like the data to be…

sql sql-server partitioning dynamic-sql sql-agent-job

asked Aug 10 '15 at 15:55

Amr Tharwat

votes

2 answers

Undo Table Partitioning

I have a table 'X' and did the following CREATE PARTITION FUNCTION PF1(INT) AS RANGE LEFT FOR VALUES (1, 2, 3, 4) CREATE PARTITION SCHEME PS1 AS PARTITION PF1 ALL TO ([PRIMARY]) CREATE CLUSTERED INDEX CIDX_X ON X(col1) ON PS1(col1) this 3 steps…

sql-server database sql-server-2008 partitioning

asked Jun 03 '10 at 13:00

Storm

4,307
11
40
57

Prev 1 2 3

…

99 100 Next