Questions tagged [partitioning]

Partitioning is a performance strategy whereby you divide possibly very large groups of data into some number of smaller groups of data.

The expectation is that with algorithms of order exponentially greater than N the total time it takes to process the smaller groups and combine the results is still less than the time it would take to process the one larger set of data.

Partitioning is similar to range partitioning in many ways. As in partitioning by RANGE, each partition must be explicitly defined.

3138 questions

votes

2 answers

In Oracle SQL, can I query a partition of a table instead of an entire table to make it run faster?

I would like to query a table with a million records for customers named 'FooBar' that have records dated on 7-24-2016. The table has 10 days of data in it. select * from table where customer = 'FooBar' and insert_date between to_date('2016-07-24…

asked Jul 25 '16 at 21:53

Cale Sweeney

1,014
1
15
37

votes

1 answer

Write Spark dataframe as CSV with partitions

I'm trying to write a dataframe in spark to an HDFS location and I expect that if I'm adding the partitionBy notation Spark will create partition (similar to writing in Parquet format) folder in form of partition_column_name=partition_value ( i.e…

csv apache-spark apache-spark-sql partitioning

asked May 29 '16 at 12:30

Lior Baber

votes

3 answers

Postgresql Table Partitioning Django Project

I have a Django 1.7 project that uses Postgres 9.3. I have a table that will have rather high volume. The table will have anywhere from 13million to 40million new rows a month. I would like to know what the best way to incorporate Postgres table…

django postgresql partitioning

asked Jun 01 '15 at 00:45

arcane

votes

5 answers

Need an algorithm to split a series of numbers

After a few busy nights my head isn't working so well, but this needs to be fixed yesterday, so I'm asking the more refreshed community of SO. I've got a series of numbers. For example: 1, 5, 7, 13, 3, 3, 4, 1, 8, 6, 6, 6 I need to split this…

algorithm formatting partitioning

asked Oct 13 '11 at 08:40

Vilx-

104,512
87
279
422

votes

2 answers

Spark: Order of column arguments in repartition vs partitionBy

Methods taken into consideration (Spark 2.2.1): DataFrame.repartition (the two implementations that take partitionExprs: Column* parameters) DataFrameWriter.partitionBy Note: This question doesn't ask the difference between these methods From docs…

apache-spark dataframe apache-spark-sql partitioning

asked Jan 20 '18 at 12:58

y2k-shubham

10,183
11
55
131

votes

2 answers

Why does sortBy transformation trigger a Spark job?

As per Spark documentation only RDD actions can trigger a Spark job and the transformations are lazily evaluated when an action is called on it. I see the sortBy transformation function is applied immediately and it is shown as a job trigger in the…

apache-spark rdd partitioning partitioner

asked Dec 30 '16 at 22:49

Prabu Soundar Rajan

votes

6 answers

Quicksort - Hoare's partitioning with duplicate values

I have implemented the classic Hoare's partitioning algorithm for Quicksort. It works with any list of unique numbers [3, 5, 231, 43]. The only problem is when I have a list with duplicates [1, 57, 1, 34]. If I get duplicate values I enter an…

algorithm sorting quicksort partitioning

asked Nov 01 '16 at 20:58

valdi.k

votes

2 answers

pyspark partitioning data using partitionby

I understand that partitionBy function partitions my data. If I use rdd.partitionBy(100) it will partition my data by key into 100 parts. i.e. data associated with similar keys will be grouped together Is my understanding correct? Is it advisable…

python apache-spark pyspark partitioning rdd

asked Mar 13 '16 at 17:45

user2543622

5,760
25
91
159

votes

2 answers

Hive doesn't read partitioned parquet files generated by Spark

I'm having a problem to read partitioned parquet files generated by Spark in Hive. I'm able to create the external table in hive but when I try to select a few lines, hive returns only an "OK" message with no rows. I'm able to read the partitioned…

apache-spark hive partitioning partition parquet

asked Nov 05 '15 at 18:15

ALunz

votes

4 answers

Is it possible to partially refresh a materialized view in Oracle?

I have a very complex Oracle view based on other materialized views, regular views as well as some tables (I can't "fast refresh" it). Most of the time, existing records in this view are based on a date and are "stable", with new record sets having…

oracle data-warehouse partitioning materialized-views

asked Nov 23 '09 at 14:07

Galghamon

2,012
18
27

votes

2 answers

Database sharding on Heroku

At some point in the next few months our app will be at the size where we need to shard our DB. We are using Heroku for hosting, Node.js/PostgreSQL stack. Conceptually, it makes sense for our app to have each logical shard represent one user and all…

database heroku partitioning sharding heroku-postgres

asked Feb 13 '13 at 19:16

raviparikh

votes

4 answers

How to script sfdisk or parted for multiple partitions?

For QA purposes I need to be able to partition a drive via a bash script up to 30 or more partitions for both RHEL and SLES. I have attempted to do this in BASH with fdisk via a "here document" which works but as you can guess blows up in various…

bash partitioning

asked Aug 27 '12 at 21:52

LabRat

votes

2 answers

How does one Azure table storage table with many partition keys compare to many tables with fewer partition keys?

I have a Windows Azure application in which all read queries of TableA are executed on single partitions for a range of rowkeys. The Partition Keys that facilitate this storage scheme are actually flattened names of objects in a hierarchy, such that…

azure scalability partitioning azure-table-storage

asked Jun 12 '11 at 04:41

user483679

votes

2 answers

Sharded load balancing for stateful services in Kubernetes

I am currently switching from Service Fabric to Kubernetes and was wondering how to do custom and more complex load balancing. So far I already read about Kubernetes offering "Services" which do load balancing for pods hidden behind them, but this…

kubernetes load-balancing partitioning sharding kubernetes-statefulset

asked Nov 02 '19 at 22:55

Sossenbinder

4,852
5
35
78

votes

1 answer

how to add new column to partitioned tables in postgres

I have created a new master table with multiple partitions on basis of a column value using declarative partitioning of postgres 10. How can i add new columns to the tables?

postgresql partitioning ddl

asked Jan 15 '19 at 10:07

Shreya Batra

Prev 1 2 3

…

99 100 Next