Highest Voted 'hadoop-partitioning' Questions

2

votes

2 answers

map reduce with two input files, with one file processed based on another

I need to write a map reduce that takes input as two input files. First input file looks like this: key1 , 25 key1 , 35 key1 , 60 key2 , 30 key3 , 45 key3 , 65 Second input file is as follows: key1, -10 key2, -20 key3, -15 and I need to get an…

asked Aug 14 '15 at 14:23

user2715182

653
2
10
23

2

votes

3 answers

TotalOrderPartitioner giving wrong key class Error

I am trying my hands on TotalOrderPartitioner hadoop. While doing so I am getting the following error. Error stating - "wrong key class" Driver Code - import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import…

hadoop hadoop-partitioning

asked May 18 '15 at 09:09

Nitin Sharma

21
3

2

votes

2 answers

How does SparkContext.textFile work under the covers?

I am trying to understand the textFile method deeply, but I think my lack of Hadoop knowledge is holding me back here. Let me lay out my understanding and maybe you can correct anything that is incorrect When sc.textFile(path) is called, then…

hadoop apache-spark partitioning hadoop-partitioning

asked May 18 '15 at 01:35

Justin Pihony

66,056
18
147
180

2

votes

4 answers

Map reduce and hash partitioning

While learning about MapReduce, I encountered this question: A given Mapreduce program has the Map phase generate 100 key-value pairs with 10 unique keys. How many Reduce tasks can this program have when at least one Reduce task will certainly be…

mapreduce hadoop-partitioning

asked Apr 19 '15 at 14:59

Saumya George

21
2

2

votes

0 answers

Custom hash function for Hive buckets

I need to implement total ordering of output results in Hive with several reducers(e.g.4). As I found by the link Hive is using expression: hash_function(bucketing column) mod num_buckets. And as a result of input set of numbers(41,42,43,51,52,53)…

hadoop mapreduce hive bigdata hadoop-partitioning

asked Feb 11 '15 at 16:18

Speise

789
1
12
28

2

votes

1 answer

Issue while installing hadoop-2.2.0 in linux 64 bit machine

Using this link ,tried installing Hadoop version - 2.2.0(single node cluster)in ubuntu 12.04(64 bit machine) http://bigdatahandler.com/hadoop-hdfs/installing-single-node-hadoop-2-2-0-on-ubuntu/ while formatting the hdfs file system via namenode…

hadoop hadoop-streaming hadoop2 hadoop-plugins hadoop-partitioning

asked Aug 07 '14 at 05:23

user3532122

15
4

2

votes

2 answers

Hadoop in action Patent example explanation

I was going through the examples for patent data in Hadoop in action. Could you please explain in detail about the data sets being used? The patent citation data set This data set contains two columns citing and cited patents. Citing column refers…

hadoop hadoop-streaming hadoop-partitioning

asked Apr 03 '14 at 02:18

user3491872

21
2

2

votes

1 answer

understanding custom partitioner in hadoop

i am learning partitioner concept now.can any one explain me the below piece of code.it is hard for me to understand public class TaggedJoiningPartitioner extends Partitioner { @Override public int getPartition(TaggedKey…

hadoop mapreduce mapper hadoop-partitioning reducers

asked Aug 21 '13 at 11:20

user1585111

1,019
6
19
35

2

votes

2 answers

Failed to get system directory - hadoop

Using hadoop multinode setup (1 mater , 1 salve) After starting up start-mapred.sh on master , i found below error in TT logs (Slave an) org.apache.hadoop.mapred.TaskTracker: Failed to get system directory can some one help me to know what can be…

hadoop mapreduce hadoop-partitioning

asked Jul 29 '13 at 05:27

Surya

3,408
5
27
35

2

votes

0 answers

creating new table with dynamic partitions from existing non-partitioned table in Hive

I have existing table structure in HIVE which has various fields e.g.(a string, b string, tstamp string, c string) including one tstamp field. I need to create a new partitioned table(table_partitioned) from the existing table(original_table) but…

hadoop hive hadoop-partitioning

asked Jul 23 '13 at 20:14

hitrix

133
3
11

2

votes

1 answer

hadoop file splitting using KeyFieldBasedPartitioner

I have a big file that is formatted as follows sample name \t index \t score And I'm trying to split this file based off of sample name using Hadoop Streaming. I know ahead of time how many samples there are, so can specify how many reducers I…

hadoop mapreduce hadoop-streaming hadoop-partitioning

asked Jun 25 '13 at 21:48

mortonjt

650
1
5
23

2

votes

1 answer

Can already partitioned input data improve the hadoop processing?

I know that during the intermediate steps between mapper and reducer, hadoop will sort and partition the data on its way to the reducer. Since I am dealing with already partitioned data in my input to the mapper, is there a way to take advantage of…

hadoop hadoop-partitioning

asked Jun 25 '13 at 21:19

Gabriel Burete

140
6

2

votes

1 answer

Hadoop reducers receiving wrong data

I have a load of JobControls running at the same time, all with the same set of ControlledJobs. Each JobControl is dealing with a different set of input / output files, by date range, but they are all of the type. The problem that I am observing is…

java hadoop mapreduce hadoop-partitioning

asked Mar 19 '13 at 15:53

Ben Smith

1,554
1
15
26

1

vote

1 answer

Gitolite ACL partition activation with fstab ?

I don't understand and i don't find any information about ACL and gitolite. In first intention, i want to install gitosis, which need instalation of apt-get install ACL package for debian, and activation of acl into fstab file. With gitolite, a…

git acl gitolite gitosis hadoop-partitioning

asked Dec 02 '11 at 12:59

reyman64

523
4
34
73

1

vote

0 answers

Spark sc.binaryFiles() partitioning small files and YARN

Using the sc.binaryFiles() function in Spark 2.3.0 on a Hortonworks 2.6.5 server, I noticed its behavior which I cannot explain regarding the default partitioning in a YARN managed cluster. Please see the sample code below: import…

scala apache-spark hadoop-yarn hadoop-partitioning

asked Jun 30 '22 at 10:28

uhlik

105
9

Questions tagged [hadoop-partitioning]