Questions tagged [data-partitioning]

Data partitioning deals with the dividing of a collection of data into smaller collections of data for the purpose of faster processing, easier statistics gathering and smaller memory/persistence footprint.

337 questions
3
votes
3 answers

Is there a way to use jq to split a JSON file by its common keys?

I have a set of pricing data for a lot of stocks (around 1.1 million lines). I'm having trouble parsing all of this data in memory so I'd like to split it by stock symbol into individual files and only import the data as it is…
Matt
  • 65
  • 4
3
votes
2 answers

how to create Dynamic number of partitions using key based table partitioning in MYSQL?

I'm trying to create Partitioned table using mysql, but I don't want to specify the number of partitions. for example in the given table i will have over 100k records for each region. I don't know the regions. they will coming later. so the number…
RamiReddy P
  • 1,628
  • 1
  • 18
  • 29
3
votes
2 answers

U-SQL Split a CSV file to multiple files based on Distinct values in file

I have the Data in Azure Data Lake Store and I am processing the data present there with Azure Data Analytic Job with U-SQL. I have several CSV files which contain spatial data, similar to this: File_20170301.csv longtitude| lattitude | date …
FeodorG
  • 178
  • 2
  • 10
3
votes
1 answer

MySQL - Move data between partitions aka re-partition

I have a mysql table whose partitions look as below p2015h1 - Contains data where date < 2015-07-01 (Has data from 2016-06-01. Hence only month worth of data) p2015h2 - Contains data where date < 2016-01-01 p2016h1 - Contains data where date <…
usert4jju7
  • 1,653
  • 3
  • 27
  • 59
3
votes
4 answers

Nicest, efficient way to get result tuple of sequence items fulfilling and not fulfilling condition

(This is professional best practise/ pattern interest, not home work request) INPUT: any unordered sequence or generator items, function myfilter(item) returns True if filter condition is fulfilled OUTPUT: (filter_true, filter_false) tuple of…
Tony Veijalainen
  • 5,447
  • 23
  • 31
3
votes
1 answer

Oracle Partition by ID and subpartition by DATE with interval

The schema I'm working on has a small amount of customers, with lots of data per customer. In determining a partitioning strategy, my first thought was to partition by customer_id and then subpartition by range with a day interval. However you…
rcurrie
  • 329
  • 1
  • 3
  • 17
3
votes
1 answer

MySQL Partitioning: Performance increase For multiple partitioned tables. Why?

I have implemented a benchmark which tests the permanence of reads and writes on 10 different tables. I have 10 java threads , each performs queries on its table only: Threads 1 performs operations on Table1, Threads 2 performs operations on Table2,…
Michael
  • 2,827
  • 4
  • 30
  • 47
3
votes
2 answers

Parallel Partition Algorithm in C#: How to Maximize Parallelism

I've written a parallel algorithm in C# to partition an array into two lists, one that contains elements which satisfies a given predicate and the other list contains the elements that fails to satisfy the predicate. It is an order preserving…
cdiggins
  • 17,602
  • 7
  • 105
  • 102
2
votes
1 answer

Find all possible pairs between the subsets of N sets with Erlang

I have a set S. It contains N subsets (which in turn contain some sub-subsets of various lengths): 1. [[a,b],[c,d],[*]] 2. [[c],[d],[e,f],[*]] 3. [[d,e],[f],[f,*]] N. ... I also have a list L of 'unique' elements that are contained in the set S: a,…
skanatek
  • 5,133
  • 3
  • 47
  • 75
2
votes
1 answer

Seeking a solution or a heursitic approxmation for the 3-partition combinatorial situation

How do I distribute 48 items each with its own dollar value to each of 3 inheritors so that the value given to each is equal or nearly equal? This is a form of partitioning problem with is NP-complete (or some such) and therefore impossible to…
2
votes
3 answers

Integer partitioning in Scala

Given n ( say 3 people ) and s ( say 100$ ), we'd like to partition s among n people. So we need all possible n-tuples that sum to s My Scala code below: def weights(n:Int,s:Int):List[List[Int]] = { List.concat( (0 to…
2
votes
3 answers

Number of ways to partition a number in Python

I have defined a recursive function that takes a number, n, and returns a list of lists of the numbers that sum to that number (partitions): def P(n): # base case of recursion: zero is the sum of the empty list if n == 0: yield [] …
2
votes
0 answers

How to make Spring Boot JPA support (Partition) in its Query

I am working on a transaction table in MySQL, and according to some requirements I have to ALTER table (Transaction) and apply a partition on it (Year) wise, and (Sub-Partition) month-wise, and it worked successfully in MySQL workbench. the query I…
2
votes
2 answers

How do I split a convex polygon into two areas of a given proportion?

Given a convex polygon P and a point A on P's boundary, how do I compute a point B also on P's boundary such that AB splits P into two areas of a given proportion? Ideally I'd like an analytical solution. As a last resort I can draw a line anywhere…
2
votes
1 answer

what happens when shuffle partition is greater than 200( spark.sql.shuffle.partitions 200(by default) in dataframe)

spark sql aggregation operation which shuffles data i.e spark.sql.shuffle.partitions 200(by default). what happens on performance when shuffle partition is greater than 200. Spark uses a different data structure for shuffle book-keeping when the…
Santhosh reddy
  • 101
  • 1
  • 3
  • 8