Questions tagged [partitioner]

Partitioners are software components that divide possibly very large groups of data into some number of smaller groups of data of hopefully equal size.

This is a performance technique that reduces the amount or time spent processing the entire set of data with algorithms having exponential magnitude.

59 questions
0
votes
0 answers

Spark: How can we remove partitioner from RDD?

I am grouping a RDD based on a key. rdd.groupBy(_.key).partitioner => org.apache.spark.HashPartitioner@a I see that by default Spark, associates HashPartitioner with this RDD, which is fine by me because I agree that we need some kind of…
shashwat
  • 81
  • 5
0
votes
0 answers

Custom Spark partitioner for an RDD of S3 paths

I have an RDD[(Long, String)] of S3 paths (bucket + key) with their sizes. I want to partition it in such a way that each partition gets paths whose sizes sum up approximately to the same value. That way when I read content for these paths, each…
aa8y
  • 3,854
  • 4
  • 37
  • 62
0
votes
3 answers

Partitioner is not working correctly

I am trying to code one MapReduce scenario in which i have created some User ClickStream data in the form of JSON. After that i have written Mapper class to fetch the required data from the file my mapper code is :- private final static String URL =…
rraghuva
  • 131
  • 1
  • 10
0
votes
0 answers

How to implement a custom Partitioner

I am trying to understand how to implement Partitioner. My case: We read from a file and we insert to the Azure table storage. We use tasks in order to speed up the process. The file has nearly 10,000,000 lines. I tried to implement a more…
Veronica_Zotali
  • 253
  • 1
  • 3
  • 13
0
votes
1 answer

Custom Partitioner, without setting number of reducers

Is it must that we have to set number of reducers to use custom partitioner ? Example : Word Count problem, want to get all the stop words count in one partition and remaining words count to go to different partition. If I set number of reducers to…
Thelight
  • 359
  • 1
  • 5
  • 15
0
votes
1 answer

gather different keys to the same reducer function - HADOOP

I want to gather to the same reducer function all the values of the keys which have at least one integer in common. For example all the values that ​​correspond to the key "1,2" and all the values that ​​correspond to the key "2,3" must be always in…
user1819076
0
votes
1 answer

hadoop mapreduce partitioner not invoked

I need help with mapreduce job, my custom partitioner is never invoked. I checked everything million times, but no result. It used to work a while ago, I have no idea why now it isn't. Any help would be very appreicated. I am adding the code (It…
0
votes
2 answers

Partitioner or MultipleOutputs

I would like to have your opinion regarding Partitioner vs MultipleOutputs. Suppose I have a file which contains keys as 0:aaa 1:bbb 0:ccc 0:ddd ... 1:zzz I would like have 2 files: one file containing keys starting with 0: and the…
0
votes
4 answers

Hash value from keys on Cassandra

I'm developing a mechanism for Cassandra using Hector. What I need at this moment is to know which are the hash values of the keys to look at which node is stored (looking at the tokens of each one), and ask directly this node for the value. What I…
0
votes
1 answer

Can I have different partitioners in a multiple datacenter configuration in cassandra?

Can I have RandomPartitioner in the cluster in datacenter1 and Murmur3Partitioner in the cluster in datacenter2?
juan
  • 80,295
  • 52
  • 162
  • 195
0
votes
2 answers

Custom Partitioner Error

I am writing my own custom Partitioner(Old Api) below is the code where I am extending Partitioner class: public static class WordPairPartitioner extends Partitioner { @Override public int getPartition(WordPair wordPair,…
JackSparrow
  • 707
  • 2
  • 10
  • 24
0
votes
1 answer

How is partitioned file with intermediate values on map worker in MapReduce?

I'm trying to understand MapReduce model and I need advice because I'm not sure about the way how is sorted and partitioned file with intermediate results of map function. The most my knowledges about MapReduce I got from MapReduce papers of Jeffrey…
adam..
  • 135
  • 1
  • 12
0
votes
1 answer

How outputcollector works?

I was trying to analyse the default map reduce job, that doesn't define a mapper or a reducer. i.e. one that uses IdentityMapper & IdentityReducer To make myself clear I just wrote my identity reducer public static class MyIdentityReducer extends…
S Kr
  • 1,831
  • 2
  • 25
  • 50
0
votes
3 answers

Parameter numPartitions in Partitioner class

Gurus! Can anybody answer: where defined parameter numPartitions in Partitioner class(value of this paramen)?
Mijatovic
  • 229
  • 1
  • 3
  • 7
1 2 3
4