Data partitioning deals with the dividing of a collection of data into smaller collections of data for the purpose of faster processing, easier statistics gathering and smaller memory/persistence footprint.
Questions tagged [data-partitioning]
337 questions
4
votes
2 answers
Remove matching/non-matching elements of a nested array using jq
I need to split the results of a sonarqube analysis history into individual files. Assuming a starting input below,
{
"paging": {
"pageIndex": 1,
"pageSize": 100,
"total": 3
},
"measures": [
{
"metric": "coverage",
…

ramfree17
- 79
- 2
- 8
4
votes
6 answers
Slice a PowerShell array into groups of smaller arrays
I would like to convert a single array into a group of smaller arrays, based on a variable. So, 0,1,2,3,4,5,6,7,8,9 would become 0,1,2,3,4,5,6,7,8,9 when the size is 3.
My current…

craig
- 25,664
- 27
- 119
- 205
4
votes
1 answer
SQL Partition By alternating groups of rows
I have a data table that kind of looks like this.
|Key|LotId|TransactionType|Quantity|Destination
|1 |A |Transform |NULL |Foo
|2 |A |Transform |NULL |Bar
|3 |A |Consume |100 |NULL
|4 |B |Transform …

NA Slacker
- 843
- 6
- 12
- 24
4
votes
1 answer
How to break large csv file, process it on multiple core and combine the result into one using nodeJs
I have very big csv file (370GB). I have enough RAM(64 GB) running on windows 10.
I think following is the best way to process the data on my system but I'm not sure weather how to achieve it.
I want to break it into 4 different csv files(because I…

Akhilesh Kumar
- 9,085
- 13
- 57
- 95
4
votes
1 answer
How to get COUNT(*) from one partition of a table in SQL Server 2012?
My table have 7 million records and I do split table in 14 part according to ID, each partition include 5 million record and size of partition is 40G. I want to run a query to get count in one partition but it scan all partitions and time of Query…

Masoomian
- 740
- 1
- 10
- 25
4
votes
2 answers
in caret: creating multiple different size partitions for testing/training/validation
I'm trying to take a dataset and partition it into 3 pieces: training: 60%, testing: 20%, and validation: 20%.
part1 <- createDataPartition(fullDataSet$classe, p=0.8, list=FALSE)
validation <- fullDataSet[-part1,]
workingSet <-…

LRG
- 41
- 1
- 2
4
votes
2 answers
From Range Partition to Range-Interval
I would like to move from Range Partition to Range-Interval, but my current table has a partition on MAXVALUE and the column used for partitioning allows null values :(
E.g.: Say we have:
create table a (b number)
partition by range (b) (
…

Mario Corchero
- 5,257
- 5
- 33
- 59
4
votes
2 answers
find all disjoint (non-overlapping) sets from a set of sets
My problem: need to find all disjoint (non-overlapping) sets from a set of sets.
Background: I am using comparative phylogenetic methods to study trait evolution in birds. I have a tree with ~300 species. This tree can be divided into subclades…

user1322491
- 41
- 1
- 4
4
votes
1 answer
Generating unique sorted partitions in Ruby
I'm trying to generate the set of sequences as shown below, not in any particularly order, but here its shown as a descending sequence. Note that each sequence also descends as I'm interested in combinations, not permutations. I'd like to store each…

user1212
- 41
- 3
4
votes
5 answers
Jon Bentleys beautiful quicksort - how does it even work?
I thought I had a good understanding of how quicksort works, until I watched the vid on http://code.google.com/edu/algorithms/index.html where Jon Bentley introduced his "beautiful quicksort code", which is as follows:
void quicksort(int l, int u){
…

Kira
- 549
- 8
- 24
3
votes
2 answers
How many different partitions with exactly n parts can be made of a set with k-elements?
How many different partitions with exactly two parts can be made of the set {1,2,3,4}?
There are 4 elements in this list that need to be partitioned into 2 parts. I wrote these out and got a total of 7 different…

Jared
- 391
- 2
- 6
- 14
3
votes
3 answers
In-place partition when the array may or may not contain the pivot element
Is there an in-place partitioning algorithm (of the kind used in a Quicksort implementation) that does not rely on the pivot element being present in the array?
In other words, the array elements must be arranged in this order:
Elements less than…

finnw
- 47,861
- 24
- 143
- 221
3
votes
1 answer
Estimating How Long It Takes To Partition A Large Table
I'm trying to figure out how long it will take to partition a large table. I'm about 2 weeks into partitioning this table and don't have a good feeling for how much longer it will take. Is there any way to calculate how long this query might…

rootScott
- 33
- 2
3
votes
3 answers
fair partitioning of set S into k partitions
There is a set S containing N integers each with value 1<=X<=10^6. The problem is to partition the set S into k partitions. The value of a partition is the sum of the elements present in it. Partition is to be done in such a way the total value of…

Akhil
- 2,269
- 6
- 32
- 39
3
votes
1 answer
How to detect duplicates in large json file using PySpark HashPartitioner
I have a large json file with over 20GB of json-structured metadata. It contains simple user metadata across some application, and I would like to sift through it to detect duplicates. Here is an example of how the data looks like:
{"created":…

John Lexus
- 3,576
- 3
- 15
- 33