Questions tagged [partitioning]

Partitioning is a performance strategy whereby you divide possibly very large groups of data into some number of smaller groups of data.

The expectation is that with algorithms of order exponentially greater than N the total time it takes to process the smaller groups and combine the results is still less than the time it would take to process the one larger set of data.

Partitioning is similar to range partitioning in many ways. As in partitioning by RANGE, each partition must be explicitly defined.

3138 questions

votes

4 answers

Partitioning data set in r based on multiple classes of observations

I'm trying to partition a data set that I have in R, 2/3 for training and 1/3 for testing. I have one classification variable, and seven numerical variables. Each observation is classified as either A, B, C, or D. For simplicity's sake, let's say…

r random partitioning

asked Nov 23 '12 at 22:44

Danny

votes

3 answers

Fill a disk with an ext4 partition in a script

I tried to use parted for scripted partitionning like so : parted -a optimal /dev/sda mklabel gpt mkpart primary ext4 1 -1 But it complains about -1 not being a recognized option. Still the same sub-command works in the parted prompt. So my…

linux partitioning

asked Oct 16 '12 at 15:03

Nicolas Barbey

6,639
4
28
34

votes

2 answers

Spark SQL saveAsTable is not compatible with Hive when partition is specified

Kind of edge case, when saving parquet table in Spark SQL with partition, #schema definitioin final StructType schema = DataTypes.createStructType(Arrays.asList( DataTypes.createStructField("time", DataTypes.StringType, true), …

hive apache-spark-sql partitioning parquet

asked Aug 31 '16 at 02:13

dunlu_98k

votes

3 answers

EXECUTE of SELECT ... INTO is not implemented

I am trying to run this function in PostrgeSQL: CREATE OR REPLACE FUNCTION create_partition_and_insert() RETURNS trigger AS $BODY$ DECLARE partition VARCHAR(25); _date text; BEGIN EXECUTE 'SELECT REPLACE(' || quote_literal(NEW.date) ||…

postgresql triggers plpgsql dynamic-sql partitioning

asked Oct 13 '15 at 15:32

shivams

2,597
6
25
47

votes

3 answers

Where clause inside an over clause in postgres

Is it possible to use the where clause inside an overclause as below ? SELECT SUM(amount) OVER(partition by prod_name WHERE dateval > dateval_13week) I cannot use preceding and following inside over clause as my dates are not in the order. All I…

sql postgresql partitioning greenplum

asked Mar 03 '14 at 14:45

user2569524

1,651
7
32
57

votes

2 answers

Cross validation for glm() models

I'm trying to do a 10-fold cross validation for some glm models that I have built earlier in R. I'm a little confused about the cv.glm() function in the boot package, although I've read a lot of help files. When I provide the following…

r partitioning prediction glm cross-validation

asked Jan 27 '14 at 11:56

Error404

6,959
16
45
58

votes

2 answers

Can MySQL create new partitions from the event scheduler

I'm having a table looking something like this: CREATE TABLE `Calls` ( `calendar_id` int(11) NOT NULL, `db_date` timestamp NOT NULL, `cgn` varchar(32) DEFAULT NULL, `cpn` varchar(32) DEFAULT NULL, PRIMARY KEY (`calendar_id`), KEY…

mysql partitioning

asked Nov 23 '09 at 22:41

nos

223,662
58
417
506

votes

1 answer

How is Partitioning done in Hazelcast

I am using Hazelcast v2.5. I have a few doubts related to partitioning in a cluster. How are the partitions identified ? When a m.get request is made how does Hazelcast identify in which partition the data resides? ( apart from the key ) How is…

java partitioning in-memory-database hazelcast

asked Apr 02 '13 at 06:28

Hazel_arun

1,721
2
13
17

votes

3 answers

Mysql 5.5 Table partition user and friends

I have two tables in my db that have millions of rows now, the selection and insertion is getting slower and slower. I am using spring+hibernate+mysql 5.5 and read about the sharding as well as partitioning the table and like the idea of…

mysql partitioning sharding database-partitioning

asked Nov 27 '12 at 15:21

maaz

4,371
2
30
48

votes

1 answer

Changing a partition with fdisk shows a warning like "partition#x contains ext4-signature"

I'm shrinking a partion size with #Reduce Partition Size fsck -f /dev/sdb2 resize2fs /dev/sdb2 -M -p #Limit Partion fdisk /dev/sdb ... #Now I'm changing the Partition 2 to the new (smaller) size fdisk gives me a (red) warning like partition#2…

filesystems partitioning partition ext4

asked Dec 12 '18 at 13:04

powerpete

2,663
2
23
49

votes

1 answer

Why use a bitwise AND here?

I was reading through the hadoop code and found this line in a partitioner. (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks Why are they using the bitwise AND?

java partitioning bitwise-operators

asked Feb 05 '11 at 01:09

jshen

11,507
7
37
59

votes

1 answer

Spark: save DataFrame partitioned by "virtual" column

I'm using PySpark to do classic ETL job (load dataset, process it, save it) and want to save my Dataframe as files/directory partitioned by a "virtual" column; what I mean by "virtual" is that I have a column Timestamp which is a string containing…

apache-spark dataframe pyspark apache-spark-sql partitioning

asked Feb 16 '16 at 16:07

arnaud briche

1,479
3
20
25

votes

6 answers

What is the best way to partition large tables in SQL Server?

In a recent project the "lead" developer designed a database schema where "larger" tables would be split across two separate databases with a view on the main database which would union the two separate database-tables together. The main database…

sql sql-server partitioning

asked Oct 03 '08 at 19:00

RyanFetz

votes

3 answers

What is a difference between table distribution and table partition in sql?

I am still struggling with identifying how the concept of table distribution in azure sql data warehouse differs from concept of table partition in Sql server? Definition of both seems to be achieving same results.

sql database azure-sql-database partitioning azure-synapse

asked Aug 03 '18 at 17:26

Amit Soni

votes

1 answer

In what scenarios hash partitioning is preferred over range partitioning in Spark?

I have gone through various articles about hash partitioning. But I still don't get it in what scenarios it is more advantageous than range partitioning. Using sortByKey followed by range partitioning allows data to be distributed evenly across…

performance apache-spark rdd partitioning

asked Nov 12 '17 at 08:28

Anon

Prev 1 2 3

…

99 100 Next