Hadoop partitioning deals with questions about how hadoop decides which key/value pairs are to be sent to which reducer (partition).
Questions tagged [hadoop-partitioning]
339 questions
0
votes
1 answer
Explain Hadoop Partitioner
While working on Secondary sort issue from the definitive guide , I came across a code like this :
@Override
public int getPartition(TextpairWritable tp, IntWritable value, int numPartitions) {
return…

DevHelp
- 305
- 4
- 21
0
votes
1 answer
Hive drops all the partitions if the partition column name is not correct
I am facing a strange issue with hive,
I have a table, partitioned on the basis of dept_key (its a integer eg.3212)
table is created as follows
create external table dept_details (dept_key,dept_name,dept_location) PARTITIONED BY (dept_key_partition…

Mayank
- 165
- 1
- 5
- 20
0
votes
1 answer
How to number my splits and choosing right number of mappers/reducers
My map reduce job is looking like the following:
I map the first 2 blocks to the key 1,the next two will be mapped to the key 2 and so on, as you can refer from the picture:
Now, by theory i want to send each of this keys to a reducer.
But my…

member555
- 797
- 1
- 13
- 40
0
votes
1 answer
Full table scan issue with LEFT OUTER JOIN in Hive
I'm trying to a LEFT OUTER JOIN operation on 2 of my tables in hive. Could understand that we have include filter conditions along with the join conditions in case of joins, imitting them from where conditions to avoid full table scans. Reference:…

prashant1988
- 262
- 1
- 8
- 24
0
votes
1 answer
How Namenode High availability achieved in Hadoop 1.x?
Is there any possible solution to achieve Namenode HA in Hadoop 1.x ?

Saikumar A
- 213
- 1
- 2
- 12
0
votes
3 answers
$bin/hadoop namenode --format error
I got this error when I'm trying to execute the this command: $bin/hadoop namenode –format
/home/MAHI/hadoop-1.2.1/libexec/../conf/hadoop-env.sh: line 31: unexpected EOF while looking for matching…

Mahi
- 1
- 2
0
votes
2 answers
Data in HDFS files not seen under hive table
I have to create a hive table from data present in oracle tables.
I'm doing a sqoop, thereby converting the oracle data into HDFS files. Then I'm creating a hive table on the HDFS files.
The sqoop completes successfully and the files also get…

Jonathan
- 144
- 4
- 13
0
votes
0 answers
Installing Hadoop in Pseudo Distributed Mode
I'm newbie to Hadoop and today I'm trying to install pseudo distributed mode. Here is the link I follow: http://www.tutorialspoint.com/hadoop/hadoop_enviornment_setup.htm
Everything is fine until I run the command:
start-dfs.sh
Here is what…

lenhhoxung
- 2,530
- 2
- 30
- 61
0
votes
1 answer
Filename as columns - hadoop
I have log files with that contain the date and hour in the file name. Is there a way to extract date & hour from the filename to add extra columns in hive, an example of the file is weblogs-20150101-010000.gz.
The method that I know is to…

macha
- 7,337
- 19
- 62
- 84
0
votes
2 answers
Why hbase even though hdfs is present
Why is hadoop using hbase even though hdfs is available for storage?
We can also store table data as blocks in hdfs.
Is the data stored in hbase? If so, then role will hdfs serve?
user4444053
0
votes
1 answer
hadoop - Adding drives to existing cluster
I have a 4 node hadoop cluster set up , I am adding 3 more drives to each node to my cluster . I mounted my 3 drives in one of the nodes(master)and I added a property dfs.data.dir.If I do this , my datanode is not starting . Should I make changes to…

vv2190
- 11
- 4
0
votes
0 answers
Custom Partitioning gives ArrayIndexOuntOfBounds Error
When I run my code, I get the following exception:
hadoop@hadoop:~/testPrograms$ hadoop jar cp.jar CustomPartition /test/test.txt /test/output33
15/03/03 16:33:33 INFO Configuration.deprecation: session.id is deprecated. Instead, use…

Bhavya Arora
- 49
- 1
- 1
- 7
0
votes
0 answers
Hadoop job name is not reflected in JobTrcker console
I have mentioned the job name in driver class as " job = new Job(conf, "Partitioning Even Odd Numbers"); " . And i have changed the job name " custom job" . But the job name is not reflected when run mapreduce program in jobtracker console.

RKCY
- 4,095
- 14
- 61
- 97
0
votes
2 answers
Query partition with calculation and avoid full table scan
I am an analyst trying to build a query to pull data of last 7 days from a table in Hadoop. The table itself is partitioned by date.
When I test my query with hard-coded dates, everything works as expected. However, when I write it to calculate…

eyy
- 1
0
votes
1 answer
Hadoop TotalOrderPartitioner
I am trying to use total order partioner in hadoop with following code:
job.setNumReduceTasks(4);
Path partitionFile = new Path(args[1]);
InputSampler.Sampler sampler = new InputSampler.RandomSampler(0.1,3,1)
…

Sarang Shinde
- 717
- 3
- 7
- 24