Questions tagged [partition]

Use this tag for questions about code that partitions data, memory, virtual machines, databases or disks.

In computing, partition may refer to

  • Disk partitioning, the division of a hard disk drive
  • Partition (database), the division of a database
  • Logical partition (virtual computing platform) (LPAR), a subset of a computer's resources, virtualized as a separate computer
  • Memory partition, a subdivision of a computer's memory, usually for use by a single job
  • Binary space partitioning

source: https://en.wikipedia.org/wiki/Partition

Note that non-programming questions about database partitioning are likely to be better received on Database Administrators and disk partitioning on Server Fault.

1547 questions
3
votes
0 answers

Spark - partitioned parquet - query distinct values on partition key takes a lot of time

I have a parquet partitioned by a date field "dt" in S3 (in my parquet's base directory, there are multiple "dt=" subdirectories). df = spark.read.parquet("s3://my_s3_bucket/my_parquet_location/") distinctDates =…
Averell
  • 793
  • 2
  • 10
  • 21
3
votes
1 answer

Additional partitions in buildroot

I'd like to know if it's possible to add more partitions in the partition table, and how. I've tested to change the genimage.cfg but it seems that support/scripts/genimage.sh doesn't create it. Thank you in advance.
Warren HYPOLITE
  • 385
  • 1
  • 3
  • 8
3
votes
1 answer

How to do a partitioned outer join in BigQuery

I would like to implement the partitioned outer join in BigQuery. To give a concrete example, I'd like to achieve the partitioned outer join as the accepted answer here:…
user2830451
  • 2,126
  • 5
  • 25
  • 31
3
votes
1 answer

Process SSAS tabular - one partition and remain database

The goal is refresh one named partition and other objects with default "partition" schema(other tables don't have partitions) without definition tables. etc: { "refresh": { "type": "full", "objects": [ { "database":…
3
votes
0 answers

spark structured streaming file source read from a certain partition onwards

I have a folder on HDFS like below containing ORC files: /path/to/my_folder It contains partitions: /path/to/my_folder/dt=20190101 /path/to/my_folder/dt=20190103 /path/to/my_folder/dt=20190103 ... Now I need to process the data here using…
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
3
votes
0 answers

S3 bucketing creates more buckets than Hive specified and the redundant files are meta-text but not data

I am migrating some tables, and want to re-bucket the table to exactly 1024 buckets per day. However, I saw that on some days, there are more than 1024 files in S3, and when open those files, which are all the same and significantly smaller than…
daydayup
  • 2,049
  • 5
  • 22
  • 47
3
votes
2 answers

How to concatenate small parquet files in HIVE

How to concatenate small parquet files in HIVE when below are in place. Partitions are created dynamically on HIVE table. Table is EXTERNAL. Solution Tried so far but for ORC files which has bug : For ORC file I was using below command in loop for…
Deep
  • 99
  • 4
  • 5
3
votes
2 answers

SQL: row_number: order by date asc Need Nulls to be Last

Since my DBMS doesn't allow for 'Nulls Last' in an order by clause, I need help with the following. row_number() over(Partition by a.ID order by a.Date asc I need my rows to have a row number sequence by ID ordered by date ascending, but have the…
3
votes
1 answer

Cassandra - Same partition key in different tables - when it is right?

I modeled my Cassandra in a way that i have couple of tables with the same partition key - Uuid. Each table has it's partition key and others column representing data for specific query i would like to ask. For example - 1 table have Uuid and column…
Udi
  • 598
  • 8
  • 19
3
votes
2 answers

Create new columns which show values based on ranking of other columns python

I have a dataframe with some dates as rows and values in columns. To have an idea the df looks like the below: print(df1) c1 c2 c3 c4 12/12/2016 38 10 1 8 12/11/2016 44 12 17 46 12/10/2016 13 6 2 7 12/09/2016 9 16…
clu
  • 117
  • 1
  • 6
3
votes
2 answers

SQL Error: ORA-14006: invalid partition name

I am trying to partition an existing table in Oracle 12C R1 using below SQL statement. ALTER TABLE TABLE_NAME MODIFY PARTITION BY RANGE (DATE_COLUMN_NAME) INTERVAL (NUMTOYMINTERVAL(1,'MONTH')) ( PARTITION part_01 VALUES LESS THAN…
Tajinder
  • 2,248
  • 4
  • 33
  • 54
3
votes
2 answers

KDB: How to Delete rows from Partitioned Table

I have the below query used to delete rows from a partitioned table, but it doesn't work. What is the approach used for deleting rows in a partitioned table? delete from SecurityLoan where lender=`SCOTIA, date in inDays, portfolio in…
Riley Hun
  • 2,541
  • 5
  • 31
  • 77
3
votes
4 answers

Maximum Partition key length of my data in Dynamo DB

I have an use case to place constraints on the key size in my application. I tried to find the max length of partition key so far in my DynamoDB table. This will help me to know my data before placing any internal constraints on the data that I am…
Deeps
  • 327
  • 5
  • 13
3
votes
3 answers

Haskell: how to make a list of files and a list of directories out of one common list

This is a newbie question. Suppose I want to separate a list of files and directories into a list of files and a list of directories: getFilesAndDirs :: [FilePath] -> ([FilePath], [FilePath]) getFilesAndDirs paths = let ... in (dirs,…
Alexey Orlov
  • 2,412
  • 3
  • 27
  • 46
3
votes
3 answers

SQL detect change in row

I have data from sql server attached : select * from log What I want to do is I want to check if there any changes in code for the column name. So if you see the data from table log, the code change 2 times (B02,B03). What I want to do is I want…
Raspi Surya
  • 315
  • 2
  • 11
  • 28