Questions tagged [partition]

Use this tag for questions about code that partitions data, memory, virtual machines, databases or disks.

In computing, partition may refer to

  • Disk partitioning, the division of a hard disk drive
  • Partition (database), the division of a database
  • Logical partition (virtual computing platform) (LPAR), a subset of a computer's resources, virtualized as a separate computer
  • Memory partition, a subdivision of a computer's memory, usually for use by a single job
  • Binary space partitioning

source: https://en.wikipedia.org/wiki/Partition

Note that non-programming questions about database partitioning are likely to be better received on Database Administrators and disk partitioning on Server Fault.

1547 questions
3
votes
2 answers

partition over by date for a period of time SQL BigQuery

Data has to be partitioned by id as well as by pageview_date. So for each corresponding id - code should look for the latest date available in the column edited_date that is no later than pageview_date field itself. But it has to look for all values…
Chique_Code
  • 1,422
  • 3
  • 23
  • 49
3
votes
2 answers

AWS Athena MSCK REPAIR TABLE "table_name" Error adding new partitions

When trying to refresh the partitions in a AWS Athena/Glue table I am getting this error line 1:1: mismatched input 'MSCK'. Expecting: 'ALTER', 'ANALYZE', 'CALL', 'COMMIT', 'CREATE', 'DEALLOCATE', 'DELETE', 'DESC', 'DESCRIBE', 'DROP', 'EXECUTE',…
3
votes
2 answers

How do I write a partitioned parquet file from a big query table?

I have created a parquet file from a big query table like this: EXPORT DATA OPTIONS( uri='gs://path_for_parquet_file/*.parquet', format='PARQUET', overwrite=false ) AS SELECT * FROM…
Eze M
  • 31
  • 3
3
votes
1 answer

For a multi-key partition in amazon athena, does order matter?

When setting up amazon athena partitions to be used with Partition Project with a glue catalog. Does the order of partitions within the S3 bucket matter? Example partition strategies: Partition by year/month/day.…
user1333371
  • 598
  • 3
  • 13
3
votes
2 answers

Vertical partitioning of tables in MySQL

Another question. Is it better to vertically partition wide table (in my instance I am thinking about splitting login details from address, personal etc. details of the user) on a design stage or better leave it be and partition it after having some…
RandomWhiteTrash
  • 3,974
  • 5
  • 29
  • 42
3
votes
2 answers

How do you partition tables in BigQuery using DBT

I am new to DBT and have previously been using Airflow for data transformations. In Airflow there is a variable called {{ ds }} which represents the logical date in this form YYYY-MM-DD and {{ ds_nodash }} which represents the logical date in this…
locket
  • 729
  • 2
  • 13
3
votes
1 answer

How do you draw a partition plane from a classification algorithm in a 3D plot in R

I'm trying to draw a partition border from a classification algorithm in a 3D plot in R (using plot3D). It's a relatively simple task if we only have two predictors, requiring only two axes to draw (e.g. using the partimat function). I haven't yet…
Cai Ladd
  • 59
  • 8
3
votes
1 answer

Repartition large parquet dataset by ranges of values

I have a large .parquet dataset splitted into ~256k chunks (20GB). Lately I've repacked it into 514 chunks (28GB) to reduce the number of files. What I really need is to load data based on a field which contains int32 values in the range from 0 to…
Winand
  • 2,093
  • 3
  • 28
  • 48
3
votes
1 answer

How to partition a list based on (sublist) indices of another list in Python

I have two lists, one containing some unique elements (integers in my case) and the other containing indices that indicate into which sublist of a newly created nested list the elements should be inserted. elements = [1, 2, 3, 4, 5, 6] indices = …
eltings
  • 57
  • 6
3
votes
1 answer

Spliterator generated by Iterables.partition() doesn't behave as expected?

I've noticed that the spliterator produced by using Guava's Iterables.partition(collection, partitionSize).spliterator() behaves strange. Executing trySplit() on the resultant spliterator doesn't split, but executing trySplit() on the result of the…
Mykolas T
  • 33
  • 3
3
votes
1 answer

Common practice for stages

Snowflake allows to put files of different structure in just one stage using different paths. On the other hand we can put files of the same structure in separate stage. Is stage a store for several tables of a schema or is stage a mean to store…
oogolov
  • 65
  • 1
  • 4
3
votes
1 answer

Reading partition columns without partition column names

We have data stored in s3 partitioned in the following structure: bucket/directory/table/aaaa/bb/cc/dd/ where aaaa is the year, bb is the month, cc is the day and dd is the hour. As you can see, there are no partition keys in the path (year=aaaa,…
KOB
  • 4,084
  • 9
  • 44
  • 88
3
votes
1 answer

Spark how can I see data in each partion of a RDD

I am now wishing to test the behavior of repartition() and coalesce() on my own, especially in a not so common situation where numsPartion keeps unchanged, I wish to see will a call of repartition with same partition number will still do a full…
Boyu Zhang
  • 219
  • 2
  • 12
3
votes
1 answer

Pyspark: For each month, make a cumulative sum of the previous 3 months

I'm using PYSPARK and I'm trying to make a cumulative sum of the last 3 months from a specific month: Example: Month Value Jan/19 1 Feb/19 0 Mar/19 4 Apr/19 5 May/19 0 Jun/19 10 So the cumulative sum for each month on the…
thalesthales
  • 95
  • 1
  • 7
3
votes
3 answers

What is the difference between std::stable_partition() and std::partition()?

stable_partition(vect.begin(), vect.end(), [](int x) { return x % 2 == 0; }); partition(vect.begin(), vect.end(), [](int x) { return x % 2 == 0; }); Above code is to explain difference between two.