Questions tagged [partitioning]

Partitioning is a performance strategy whereby you divide possibly very large groups of data into some number of smaller groups of data.

Partitioning is a performance strategy whereby you divide possibly very large groups of data into some number of smaller groups of data.

The expectation is that with algorithms of order exponentially greater than N the total time it takes to process the smaller groups and combine the results is still less than the time it would take to process the one larger set of data.

Partitioning is similar to range partitioning in many ways. As in partitioning by RANGE, each partition must be explicitly defined.

3138 questions
23
votes
2 answers

PostgreSQL: UPDATE implies move across partitions

(Note: updated with adopted answer below.) For a PostgreSQL 8.1 (or later) partitioned table, how does one define an UPDATE trigger and procedure to "move" a record from one partition to the other, if the UPDATE implies a change to the constrained…
pilcrow
  • 56,591
  • 13
  • 94
  • 135
22
votes
7 answers

Algorithm for finding nearby points?

Given a set of several million points with x,y coordinates, what is the algorithm of choice for quickly finding the top 1000 nearest points from a location? "Quickly" here means about 100ms on a home computer. Brute force would mean doing millions…
Bemmu
  • 17,849
  • 16
  • 76
  • 93
21
votes
2 answers

Doctrine2 and MySQL Partitioning

Does anybody has experience of using partitioning feature in conjunction with the Doctrine2 library? The first problem is that Doctrine creates foreign keys for association columns, anybody knows how to prevent or disable that? And the second…
Vladimir Kartaviy
  • 666
  • 1
  • 7
  • 24
21
votes
3 answers

What is the difference between partition_point and lower_bound?

C++11 includes the algorithm std::partition_point(). However for all the cases I have tried it gives the same answer as std::lower_bound(). The only difference being the convenient T& value parameter. Did I miss something or are these two functions…
Ankur S
  • 548
  • 3
  • 18
21
votes
2 answers

Does Spark know the partitioning key of a DataFrame?

I want to know if Spark knows the partitioning key of the parquet file and uses this information to avoid shuffles. Context: Running Spark 2.0.1 running local SparkSession. I have a csv dataset that I am saving as parquet file on my disk like…
astro_asz
  • 2,278
  • 3
  • 15
  • 31
21
votes
8 answers

Is partitioning easier than sorting?

This is a question that's been lingering in my mind for some time ... Suppose I have a list of items and an equivalence relation on them, and comparing two items takes constant time. I want to return a partition of the items, e.g. a list of linked…
reinierpost
  • 8,425
  • 1
  • 38
  • 70
21
votes
3 answers

mysql database automatic partitioning

I have a mysql database table that I want to partition by date, particularly by month & year. However, when new data is added for a new month, I don't want to need to manually update the database. When I initially create my database, I have data in…
Jeff Storey
  • 56,312
  • 72
  • 233
  • 406
21
votes
3 answers

Which part of the CAP theorem does Cassandra sacrifice and why?

There is a great talk here about simulating partition issues in Cassandra with Kingsby's Jesper library. My question is - with Cassandra are you mainly concerned with the Partitioning part of the CAP theorem, or is Consistency a factor you need to…
hawkeye
  • 34,745
  • 30
  • 150
  • 304
20
votes
3 answers

Table partitioning using 2 columns

Is it possible to partition a table using 2 columns instead of only 1 for the partition function? Consider a table with 3 columns ID (int, primary key, Date (datetime), Num (int) I want to partition this table by 2 columns: Date and…
Rafael Colucci
  • 6,018
  • 4
  • 52
  • 121
20
votes
1 answer

Default Partitioning Scheme in Spark

When I execute below command: scala> val rdd = sc.parallelize(List((1,2),(3,4),(3,6)),4).partitionBy(new HashPartitioner(10)).persist() rdd: org.apache.spark.rdd.RDD[(Int, Int)] = ShuffledRDD[10] at partitionBy at :22 scala>…
dinesh028
  • 2,137
  • 5
  • 30
  • 47
20
votes
3 answers

How to partition Mysql across MULTIPLE SERVERS?

I know that horizontal partitioning...you can create many tables. How can you do this with multiple servers? This will allow Mysql to scale. Create X tables on X servers? Does anyone care to explain, or have a good beginner's tutorial (step-by-step)…
TIMEX
  • 259,804
  • 351
  • 777
  • 1,080
20
votes
5 answers

Auto sharding postgresql?

I have a problem where I need to load alot of data (5+ billion rows) into a database very quickly (ideally less than an 30 min but quicker is better), and I was recently suggested to look into postgresql (I failed with mysql and was looking at…
Lostsoul
  • 25,013
  • 48
  • 144
  • 239
19
votes
4 answers

what is a good way to horizontal shard in postgresql

what is a good way to horizontal shard in postgresql 1. pgpool 2 2. gridsql which is a better way to use sharding also is it possible to paritition without changing client code It would be great if some one can share a simple tutorial or cookbook…
pylabs
  • 31
  • 1
  • 1
  • 6
19
votes
3 answers

Does Spark maintain parquet partitioning on read?

I am having a lot trouble finding the answer to this question. Let's say I write a dataframe to parquet and I use repartition combined with partitionBy to get a nicely partitioned parquet file. See…
Adam
  • 313
  • 1
  • 3
  • 11
18
votes
3 answers

which algorithm can do a stable in-place binary partition with only O(N) moves?

I'm trying to understand this paper: Stable minimum space partitioning in linear time. It seems that a critical part of the claim is that Algorithm B sorts stably a bit-array of size n in O(nlog2n) time and constant extra space, but makes only…
AShelly
  • 34,686
  • 15
  • 91
  • 152