Questions tagged [data-partitioning]

Data partitioning deals with the dividing of a collection of data into smaller collections of data for the purpose of faster processing, easier statistics gathering and smaller memory/persistence footprint.

337 questions
4
votes
2 answers

Remove matching/non-matching elements of a nested array using jq

I need to split the results of a sonarqube analysis history into individual files. Assuming a starting input below, { "paging": { "pageIndex": 1, "pageSize": 100, "total": 3 }, "measures": [ { "metric": "coverage", …
ramfree17
  • 79
  • 2
  • 8
4
votes
6 answers

Slice a PowerShell array into groups of smaller arrays

I would like to convert a single array into a group of smaller arrays, based on a variable. So, 0,1,2,3,4,5,6,7,8,9 would become 0,1,2,3,4,5,6,7,8,9 when the size is 3. My current…
craig
  • 25,664
  • 27
  • 119
  • 205
4
votes
1 answer

SQL Partition By alternating groups of rows

I have a data table that kind of looks like this. |Key|LotId|TransactionType|Quantity|Destination |1 |A |Transform |NULL |Foo |2 |A |Transform |NULL |Bar |3 |A |Consume |100 |NULL |4 |B |Transform …
NA Slacker
  • 843
  • 6
  • 12
  • 24
4
votes
1 answer

How to break large csv file, process it on multiple core and combine the result into one using nodeJs

I have very big csv file (370GB). I have enough RAM(64 GB) running on windows 10. I think following is the best way to process the data on my system but I'm not sure weather how to achieve it. I want to break it into 4 different csv files(because I…
Akhilesh Kumar
  • 9,085
  • 13
  • 57
  • 95
4
votes
1 answer

How to get COUNT(*) from one partition of a table in SQL Server 2012?

My table have 7 million records and I do split table in 14 part according to ID, each partition include 5 million record and size of partition is 40G. I want to run a query to get count in one partition but it scan all partitions and time of Query…
4
votes
2 answers

in caret: creating multiple different size partitions for testing/training/validation

I'm trying to take a dataset and partition it into 3 pieces: training: 60%, testing: 20%, and validation: 20%. part1 <- createDataPartition(fullDataSet$classe, p=0.8, list=FALSE) validation <- fullDataSet[-part1,] workingSet <-…
LRG
  • 41
  • 1
  • 2
4
votes
2 answers

From Range Partition to Range-Interval

I would like to move from Range Partition to Range-Interval, but my current table has a partition on MAXVALUE and the column used for partitioning allows null values :( E.g.: Say we have: create table a (b number) partition by range (b) ( …
Mario Corchero
  • 5,257
  • 5
  • 33
  • 59
4
votes
2 answers

find all disjoint (non-overlapping) sets from a set of sets

My problem: need to find all disjoint (non-overlapping) sets from a set of sets. Background: I am using comparative phylogenetic methods to study trait evolution in birds. I have a tree with ~300 species. This tree can be divided into subclades…
user1322491
  • 41
  • 1
  • 4
4
votes
1 answer

Generating unique sorted partitions in Ruby

I'm trying to generate the set of sequences as shown below, not in any particularly order, but here its shown as a descending sequence. Note that each sequence also descends as I'm interested in combinations, not permutations. I'd like to store each…
user1212
  • 41
  • 3
4
votes
5 answers

Jon Bentleys beautiful quicksort - how does it even work?

I thought I had a good understanding of how quicksort works, until I watched the vid on http://code.google.com/edu/algorithms/index.html where Jon Bentley introduced his "beautiful quicksort code", which is as follows: void quicksort(int l, int u){ …
Kira
  • 549
  • 8
  • 24
3
votes
2 answers

How many different partitions with exactly n parts can be made of a set with k-elements?

How many different partitions with exactly two parts can be made of the set {1,2,3,4}? There are 4 elements in this list that need to be partitioned into 2 parts. I wrote these out and got a total of 7 different…
Jared
  • 391
  • 2
  • 6
  • 14
3
votes
3 answers

In-place partition when the array may or may not contain the pivot element

Is there an in-place partitioning algorithm (of the kind used in a Quicksort implementation) that does not rely on the pivot element being present in the array? In other words, the array elements must be arranged in this order: Elements less than…
finnw
  • 47,861
  • 24
  • 143
  • 221
3
votes
1 answer

Estimating How Long It Takes To Partition A Large Table

I'm trying to figure out how long it will take to partition a large table. I'm about 2 weeks into partitioning this table and don't have a good feeling for how much longer it will take. Is there any way to calculate how long this query might…
rootScott
  • 33
  • 2
3
votes
3 answers

fair partitioning of set S into k partitions

There is a set S containing N integers each with value 1<=X<=10^6. The problem is to partition the set S into k partitions. The value of a partition is the sum of the elements present in it. Partition is to be done in such a way the total value of…
Akhil
  • 2,269
  • 6
  • 32
  • 39
3
votes
1 answer

How to detect duplicates in large json file using PySpark HashPartitioner

I have a large json file with over 20GB of json-structured metadata. It contains simple user metadata across some application, and I would like to sift through it to detect duplicates. Here is an example of how the data looks like: {"created":…
John Lexus
  • 3,576
  • 3
  • 15
  • 33