Questions tagged [partitioning]

Partitioning is a performance strategy whereby you divide possibly very large groups of data into some number of smaller groups of data.

Partitioning is a performance strategy whereby you divide possibly very large groups of data into some number of smaller groups of data.

The expectation is that with algorithms of order exponentially greater than N the total time it takes to process the smaller groups and combine the results is still less than the time it would take to process the one larger set of data.

Partitioning is similar to range partitioning in many ways. As in partitioning by RANGE, each partition must be explicitly defined.

3138 questions
15
votes
4 answers

Partitioning data set in r based on multiple classes of observations

I'm trying to partition a data set that I have in R, 2/3 for training and 1/3 for testing. I have one classification variable, and seven numerical variables. Each observation is classified as either A, B, C, or D. For simplicity's sake, let's say…
Danny
  • 625
  • 2
  • 8
  • 11
15
votes
3 answers

Fill a disk with an ext4 partition in a script

I tried to use parted for scripted partitionning like so : parted -a optimal /dev/sda mklabel gpt mkpart primary ext4 1 -1 But it complains about -1 not being a recognized option. Still the same sub-command works in the parted prompt. So my…
Nicolas Barbey
  • 6,639
  • 4
  • 28
  • 34
14
votes
2 answers

Spark SQL saveAsTable is not compatible with Hive when partition is specified

Kind of edge case, when saving parquet table in Spark SQL with partition, #schema definitioin final StructType schema = DataTypes.createStructType(Arrays.asList( DataTypes.createStructField("time", DataTypes.StringType, true), …
dunlu_98k
  • 209
  • 2
  • 3
  • 11
14
votes
3 answers

EXECUTE of SELECT ... INTO is not implemented

I am trying to run this function in PostrgeSQL: CREATE OR REPLACE FUNCTION create_partition_and_insert() RETURNS trigger AS $BODY$ DECLARE partition VARCHAR(25); _date text; BEGIN EXECUTE 'SELECT REPLACE(' || quote_literal(NEW.date) ||…
shivams
  • 2,597
  • 6
  • 25
  • 47
14
votes
3 answers

Where clause inside an over clause in postgres

Is it possible to use the where clause inside an overclause as below ? SELECT SUM(amount) OVER(partition by prod_name WHERE dateval > dateval_13week) I cannot use preceding and following inside over clause as my dates are not in the order. All I…
user2569524
  • 1,651
  • 7
  • 32
  • 57
14
votes
2 answers

Cross validation for glm() models

I'm trying to do a 10-fold cross validation for some glm models that I have built earlier in R. I'm a little confused about the cv.glm() function in the boot package, although I've read a lot of help files. When I provide the following…
Error404
  • 6,959
  • 16
  • 45
  • 58
14
votes
2 answers

Can MySQL create new partitions from the event scheduler

I'm having a table looking something like this: CREATE TABLE `Calls` ( `calendar_id` int(11) NOT NULL, `db_date` timestamp NOT NULL, `cgn` varchar(32) DEFAULT NULL, `cpn` varchar(32) DEFAULT NULL, PRIMARY KEY (`calendar_id`), KEY…
nos
  • 223,662
  • 58
  • 417
  • 506
14
votes
1 answer

How is Partitioning done in Hazelcast

I am using Hazelcast v2.5. I have a few doubts related to partitioning in a cluster. How are the partitions identified ? When a m.get request is made how does Hazelcast identify in which partition the data resides? ( apart from the key ) How is…
Hazel_arun
  • 1,721
  • 2
  • 13
  • 17
14
votes
3 answers

Mysql 5.5 Table partition user and friends

I have two tables in my db that have millions of rows now, the selection and insertion is getting slower and slower. I am using spring+hibernate+mysql 5.5 and read about the sharding as well as partitioning the table and like the idea of…
maaz
  • 4,371
  • 2
  • 30
  • 48
13
votes
1 answer

Changing a partition with fdisk shows a warning like "partition#x contains ext4-signature"

I'm shrinking a partion size with #Reduce Partition Size fsck -f /dev/sdb2 resize2fs /dev/sdb2 -M -p #Limit Partion fdisk /dev/sdb ... #Now I'm changing the Partition 2 to the new (smaller) size fdisk gives me a (red) warning like partition#2…
powerpete
  • 2,663
  • 2
  • 23
  • 49
13
votes
1 answer

Why use a bitwise AND here?

I was reading through the hadoop code and found this line in a partitioner. (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks Why are they using the bitwise AND?
jshen
  • 11,507
  • 7
  • 37
  • 59
13
votes
1 answer

Spark: save DataFrame partitioned by "virtual" column

I'm using PySpark to do classic ETL job (load dataset, process it, save it) and want to save my Dataframe as files/directory partitioned by a "virtual" column; what I mean by "virtual" is that I have a column Timestamp which is a string containing…
13
votes
6 answers

What is the best way to partition large tables in SQL Server?

In a recent project the "lead" developer designed a database schema where "larger" tables would be split across two separate databases with a view on the main database which would union the two separate database-tables together. The main database…
RyanFetz
  • 525
  • 2
  • 8
  • 25
12
votes
3 answers

What is a difference between table distribution and table partition in sql?

I am still struggling with identifying how the concept of table distribution in azure sql data warehouse differs from concept of table partition in Sql server? Definition of both seems to be achieving same results.
12
votes
1 answer

In what scenarios hash partitioning is preferred over range partitioning in Spark?

I have gone through various articles about hash partitioning. But I still don't get it in what scenarios it is more advantageous than range partitioning. Using sortByKey followed by range partitioning allows data to be distributed evenly across…
Anon
  • 320
  • 2
  • 14