Highest Voted 'data-partitioning' Questions

2

votes

2 answers

C++ Partition a vector of vectors using

Suppose you have a 2D vector defined as follows: std::vector> v and which represents a matrix: 1 1 0 1 3 0 4 6 0 1 5 0 0 3 0 6 3 0 2 5 I want to stable-partition (say with predicate el != 0) this matrix, but in all directions. This…

asked Apr 02 '20 at 12:40

Desperados

434
5
13

2

votes

3 answers

How to create an average per partitions containing a maximum of 5 time dependent members?

My goal is to select an average of exactly 5 records only if they meet the left join criteria to another table. Let's say we have table one (left) with records: RECNUM ID DATE JOB 1 | cat | 2019.01.01 | meow 2 | dog |…

sql select sql-scripts data-partitioning hana-sql-script

asked Nov 18 '19 at 20:01

wounky

97
1
12

2

votes

2 answers

How Kafka Handles Keyed Message Related to Partition

Can anyone explain: How actually Kafka store keyed message? Does a partition only assigned to a key? I mean, is it possible that a partition stores messages with multiple keys? If first question answer is yes, then how if the number of key is more…

apache-kafka data-partitioning

asked Oct 30 '19 at 00:59

panoet

3,608
1
16
27

2

votes

1 answer

Determining partitioning key in range based partitioning of a MySQL Table

I've been researching for a while regarding database partitioning in MySQL. Since I have one ever-growing table in my DB, I thought of using partitioning as an effective tool to optimize it. I'm only interested in retaining recent data (say last 6…

mysql rdbms database-partitioning data-partitioning

asked Sep 01 '19 at 17:37

Haagenti

5,602
1
9
17

2

votes

1 answer

How to partition to spread values?

I have a table with data: Customers Sequence ID many other columns (not important) Sample data: Sequence ID ----------- 214906 2613 214906 2614 214906 2615 214907 2613 214907 2614 214907 2615 214908 2613 214908 2614 214908 2615 214000 2613 213004…

sql-server row-number data-partitioning

asked May 28 '19 at 11:16

John

218
4
8

2

votes

1 answer

Spark Partition Dataset By Column Value

(I am new to Spark) I need to store a large number of rows of data, and then handle updates to those data. We have unique IDs (DB PKs) for those rows, and we would like to shard the data set by uniqueID % numShards, to make equal sized, addressable…

scala apache-spark sharding data-partitioning

asked May 02 '19 at 18:21

radumanolescu

4,059
2
31
44

2

votes

1 answer

(SPARK) What is the best way to partition data on which multiple filters are applied?

I am working in Spark (on azure databricks) with a 15 billion rows file that looks like this : +---------+---------------+----------------+-------------+--------+------+ |client_id|transaction_key|transaction_date| …

apache-spark pyspark filtering data-partitioning azure-databricks

asked Apr 08 '19 at 15:35

RobL

41
1

2

votes

1 answer

Change two bytes in a GUID

I'm using a partitioned CosmosDb, but I don't know the value of the partition key each time I want to get a resource by its id. Now using the id as partition key is not a solution for me, since it would take too long and take up too much space (I…

c# azure-cosmosdb guid data-partitioning

asked Mar 18 '19 at 15:25

Carmen

55
3

2

votes

1 answer

Repartition Dask Dataframe with custom index

I have a huge Dask Dataframe similar to this |Ind| C1 | C2 |....| Cn | |-----------------------| | 1 |val1| AE |....|time| |-----------------------| | 2 |val2| FB |....|time| |-----------------------| |...|....| .. |....|…

python partitioning dask data-partitioning

asked Jan 22 '19 at 11:43

pichlbaer

923
1
10
18

2

votes

0 answers

spark repartition to one output file per customer

Assume I have a dataframe like: client_id,report_date,date,value_1,value_2 1,2019-01-01,2019-01-01,1,2 1,2019-01-01,2019-01-02,3,4 1,2019-01-01,2019-01-03,5,6 2,2019-01-01,2019-01-01,1,2 2,2019-01-01,2019-01-02,3,4 2,2019-01-01,2019-01-03,5,6 My…

data-partitioning

asked Jan 19 '19 at 09:30

Georg Heiler

16,916
36
162
292

2

votes

1 answer

Creating data partitions over a selected range of data to be fed into caret::train function for cross-validation

I want to create jack-knife data partitions for the data frame below, with the partitions to be used in caret::train (like the caret::groupKFold() produces). However, the catch is that I want to restrict the test points to say greater than 16 days,…

r cross-validation r-caret data-partitioning

asked Oct 15 '18 at 23:52

André.B

617
8
17

2

votes

2 answers

How to partition an image to 64 block in matlab

I want to compute the Color Layout Descriptor (CLD) for each image.. this algorithm include four stages . in the First stage I must Partition each image into 64 block i(8×8)n order to compute a single representative color from each block .. I try to…

image matlab data-partitioning

asked Mar 06 '11 at 01:09

zenab

229
3
9
20

2

votes

2 answers

How Apache Spark partitions data of a big file

Let's say I have a cluster of 4 nodes each having 1 core. I have a 600 Petabytes size big file which I want to process through Spark. File could be stored in HDFS. I think that way to determine no. of partitions is file size / total no. of cores in…

apache-spark hdfs data-partitioning

asked Jul 27 '18 at 16:19

Anand

20,708
48
131
198

2

votes

1 answer

Incorrect splitting of data using sample.split in R and issue with logistic regression

I have 2 issues. When I try to split my data into test and train sets, using sample.split as below, the sampling is done rather unclearly. What I mean is that the data d, has a length of 392 and so, 4:1 division should show 0.8*392= 313.6 i.e. 313…

r glm data-partitioning

asked Nov 29 '17 at 07:35

Akshayanti

354
3
15

2

votes

1 answer

Obtain KeyedStream from custom partitioning in Flink

I know that Flink comes with custom partitioning APIs. However, the problem is that, after invoking partitionCustom on a DataStream you get a DataStream back and not a KeyedStream. On the other hand, you cannot override the partitioning strategy for…

apache-flink data-partitioning

asked Oct 18 '16 at 07:25

affo

453
3
15

Questions tagged [data-partitioning]