Questions tagged [partition]

Use this tag for questions about code that partitions data, memory, virtual machines, databases or disks.

In computing, partition may refer to

  • Disk partitioning, the division of a hard disk drive
  • Partition (database), the division of a database
  • Logical partition (virtual computing platform) (LPAR), a subset of a computer's resources, virtualized as a separate computer
  • Memory partition, a subdivision of a computer's memory, usually for use by a single job
  • Binary space partitioning

source: https://en.wikipedia.org/wiki/Partition

Note that non-programming questions about database partitioning are likely to be better received on Database Administrators and disk partitioning on Server Fault.

1547 questions
0
votes
1 answer

Storage required on each member for a partitioned persistent region

In case of a persistent partitioned region, what data is stored in the associated disk-store on ANY ONE MEMBER. Is it all the data for the region including those held on other members, or is it just the primary data THE MEMBER is hosting, or is it…
0
votes
1 answer

RDD Partitioning

Suppose we have a file on HDFS having 3 blocks(64mb each). When we create a RDD using the same file with 3 partitions, then each node on the cluster(suppose cluster is having 3 data nodes) will have duplicate file contents( one block from hdfs and a…
Abhinav Kumar
  • 210
  • 3
  • 13
0
votes
1 answer

Can I modify Hive partition location using java api?

I am now developping system whose job is to consume data from kafka and put it into hive.Since the table have a partition "day" , so , the location of this partition on hdfs will be /root/tableLocation/day=20161110/adfadfaaf.avro . But, this…
wuchang
  • 3,003
  • 8
  • 42
  • 66
0
votes
1 answer

Optimizing Large Table Join in PySpark

I have a large fact table, roughly 500M rows per day. The table is partitioned by region_date. I have to scan through 6 months of data every day, left outer join with another smaller subset (1M rows) based on an id & date column and calculate two…
Ram
  • 63
  • 2
  • 7
0
votes
1 answer

Insert into hive table with dynamic partition only writing first partition to disk and not all

I am trying to write data into a hive table and failing. I get a error at the end of Cycle_dt =null and only one partition being writing. It is the first day's. set hive.auto.convert.join=true; set hive.optimize.mapjoin.mapreduce=true; set…
0
votes
2 answers

Partition key for DocumentDB

I have a question about DocumentDB partition key choise. I have data with UserId, DeviceId and WhateverId. UserId parameter will be in queries always, so I have chosen UserId as a partition key. But I have a lot of data for one user (millions of…
Paval
  • 976
  • 1
  • 11
  • 20
0
votes
1 answer

Mysql Partition pruning not working on my subquery

explain SELECT ip_src, (SELECT country FROM ip_location WHERE ip_start between (134744072-500000) and (134744072) and ip_end > 134744072) country_src, ip_dst FROM event e WHERE long_date BETWEEN '2016-03-25 00:00:00' AND…
0
votes
1 answer

how to set up dynamic partition where the column keys will be the partitions

So I have a table A and table B, where table A data was inserted from table B. essentially table A is same as table B, only difference is that table A has a date_partition column where table B does not have. the table A schema is as such: ID…
Misha
  • 133
  • 5
  • 15
0
votes
2 answers

SQL Server Join Queries using Rank and Partition

I want one table of the highest BID and lowest ASK (price) for each EntityCode in the db. The following two sets of code return two result sets I but cannot yet find/figure out how to join them: Get highest Bid (SELECT * FROM (SELECT …
Mike S
  • 157
  • 3
  • 13
0
votes
2 answers

Average and group by in SQL but for best 10 records only

Given: A ranking table (id, user_id, score, group_id, date) Currently we calculate a ranking based on all participating users based on sum and average. SELECT ROUND(AVG(r.score)::NUMERIC, 2) AS score, SUM(score) AS score_sum, MAX(r.date)…
firegate666
  • 98
  • 1
  • 10
0
votes
0 answers

Query many partitions at once or query each partition individually?

I have to find an element in a table that has many partitions, I do not know in which one of them will it be found or even if the element exists in any of them, so I have to query many partitions. As there are many partitions, I usually specify…
user2256799
  • 229
  • 1
  • 3
  • 10
0
votes
1 answer

change partitioning in HANA?

I have created a table and partitioned it with the round robin method in SAP HANA. I have loaded the data into the table and now I cannot add the primary key for the table since it is a round robin partition. Is there any way to add primary key for…
user6811693
0
votes
1 answer

Find total number of Equal Partition

For a number N f(N) = the total number of parts in the partitions of N into equal parts. For example if the given number is 4, the equal partitions will be: {1,1,1,1} ->total parts=4 {2,2} -> total parts=2 {4} -> total…
msank
  • 21
  • 1
  • 4
0
votes
1 answer

Partitioning in Oracle using custom functions

I'm trying to partition my table using the ID column such that all even ID's should go in partition_1 and odd ID's should go in partition_2. The only closest thing that met my needs was virtual columns. CREATE TABLE sales ( id NUMBER(6) NOT…
ffff
  • 2,853
  • 1
  • 25
  • 44
0
votes
1 answer

First element of each dataframe partition Spark 2.0

I need to retrieve the first element of each dataframe partition. I know that I need to use mapPartitions but it is not clear for me how to use it. Note: I am using Spark2.0, the dataframe is sorted.
syl
  • 419
  • 2
  • 5
  • 17