Questions tagged [partitioning]

Partitioning is a performance strategy whereby you divide possibly very large groups of data into some number of smaller groups of data.

Partitioning is a performance strategy whereby you divide possibly very large groups of data into some number of smaller groups of data.

The expectation is that with algorithms of order exponentially greater than N the total time it takes to process the smaller groups and combine the results is still less than the time it would take to process the one larger set of data.

Partitioning is similar to range partitioning in many ways. As in partitioning by RANGE, each partition must be explicitly defined.

3138 questions
1
vote
1 answer

Partition a column based on missing values

How would I go about partitioning a column based on missing value in python. I have have the following table in a dataframe: Store Bag Alberts ClothBag Vons KateSpade Ralphs GroceryBag1 Na apple Na pear Na …
1
vote
1 answer

HBase: All data stored in one region

I'm importing HFiles into HBase using the command: hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles -Dcreate.table=no /user/myuser/map_data/hfiles my_table When I just had a look into the HBase Master UI, I saw that all data seems to…
D. Müller
  • 3,336
  • 4
  • 36
  • 84
1
vote
1 answer

Missing FROM-clause entry for table »rowtype«

I am currently writing a function in the plpgsql language to create partitions which will hold sensor data for each month (one partition for one month and sensor). I am stuck with this error: ERROR: missing FROM-clause entry for table…
bajro
  • 1,199
  • 3
  • 20
  • 33
1
vote
1 answer

Choosing random Pivot in QuickSort partitioning takes more time, how is this possible?

public static int partitionsimple_hoare(int[] arr,int l , int h){ int pivot = arr[l]; int i = l-1; int j = h+1; while(true){ do{ i++; }while(arr[i]
1
vote
1 answer

Count of sales partitioned by DOW (with date and time as input) - postgresql

Have scoured the internet for right response, but am not finding what I want. I have an example dataset as follows: Date --------------------------------- Number of Sales Saturday 9th September 13:22:00 ------ 1 Sunday 10th September 16:44:02 …
user8497255
1
vote
1 answer

Preserving the number of partitions of a Spark dataframe after transformation

I am looking at a bug in the code where a dataframe has been split into too many partitions than desired (over 700), and this causes too many shuffle operations when I try to repartition them to only 48. I can't use a coalesce() here because I want…
1
vote
3 answers

T-SQL progressive numbering partitions

I am aiming to obtain a record set like this date flag number 01 0 1 02 0 1 03 1 2 04 1 2 05 1 2 06 0 3 07 1 4 08 1 4 I start from the record set with "date" and "flag"…
RaffaeleT
  • 255
  • 3
  • 16
1
vote
1 answer

Best practices on Hazelcast persistance and multiple members

I went through several related topics here and it seems the topic is still open, official documentation does not cover it so here we are. There's a cluster with N members in one group There's one distributed map The map has persistence store backed…
1
vote
1 answer

Select parquet based on partition date

I've some heavy logs on my cluster, I've parqueted all of them with the following partition schema: PARTITION_YEAR=2017/PARTITION_MONTH=07/PARTITION_DAY=12 For example, if I want to select all my log between 2017/07/12 and 2017/08/10 is there a way…
RobinFrcd
  • 4,439
  • 4
  • 25
  • 49
1
vote
1 answer

ORA-14108: illegal partition-extended table name syntax

I have a requirement where I need to run a update script over multiple partitions of a table . I have written a script for it as below: but it gives ORA-14108: illegal partition-extended table name syntax Cause: Partition to be accessed may only be…
1
vote
3 answers

T-SQL group by partition

I have below table in SQL server 2008.Please help to get expected output Thanks. CREATE TABLE [dbo].[Test]([Category] [varchar](10) NULL,[Value] [int] NULL, [Weightage] [int] NULL,[Rn] [smallint] NULL ) ON [PRIMARY] insert into Test values…
user219628
  • 3,755
  • 8
  • 35
  • 37
1
vote
0 answers

Oracle subpartition a subpartition

I have 3 columns which I would like to partition by, let's call them some_date DATE some_type VARCHAR2 some_product VARCHAR2 I would like to partition by range using some_date, then subpartition by list using some_type, then subpartition that…
Jacek Trociński
  • 882
  • 1
  • 8
  • 23
1
vote
3 answers

Split a list into all pairs in all possible ways

I am aware of many posts with the similar questions and have been through all of them. However, I am not able to do what I need. I have list say l1=[0,1,2,3,4] which I want to partition into pair of tuples like following: [(0, 1), (2, 3), 4], [(0,…
Pankaj
  • 519
  • 2
  • 5
  • 20
1
vote
0 answers

Optimal partitioning

enter image description hereI'm looking for a way to perform optimal partitioning of the following: I have a square that is divided into a number of small equally-sized squares, say N and I need to group them into K groups so that the number of…
1
vote
2 answers

Cassandra querying multiple partitions on a single node

We have less than 50GB of data for a table and we are trying to come up with a reasonable design for our Cassandra database. With so little data we are thinking of having all data on each node (2 node cluster with replication factor of 2 to start…
eddyP23
  • 6,420
  • 7
  • 49
  • 87