Questions tagged [hadoop-partitioning]

Hadoop partitioning deals with questions about how hadoop decides which key/value pairs are to be sent to which reducer (partition).

339 questions
0
votes
1 answer

Explain Hadoop Partitioner

While working on Secondary sort issue from the definitive guide , I came across a code like this : @Override public int getPartition(TextpairWritable tp, IntWritable value, int numPartitions) { return…
DevHelp
  • 305
  • 4
  • 21
0
votes
1 answer

Hive drops all the partitions if the partition column name is not correct

I am facing a strange issue with hive, I have a table, partitioned on the basis of dept_key (its a integer eg.3212) table is created as follows create external table dept_details (dept_key,dept_name,dept_location) PARTITIONED BY (dept_key_partition…
Mayank
  • 165
  • 1
  • 5
  • 20
0
votes
1 answer

How to number my splits and choosing right number of mappers/reducers

My map reduce job is looking like the following: I map the first 2 blocks to the key 1,the next two will be mapped to the key 2 and so on, as you can refer from the picture: Now, by theory i want to send each of this keys to a reducer. But my…
member555
  • 797
  • 1
  • 13
  • 40
0
votes
1 answer

Full table scan issue with LEFT OUTER JOIN in Hive

I'm trying to a LEFT OUTER JOIN operation on 2 of my tables in hive. Could understand that we have include filter conditions along with the join conditions in case of joins, imitting them from where conditions to avoid full table scans. Reference:…
prashant1988
  • 262
  • 1
  • 8
  • 24
0
votes
1 answer

How Namenode High availability achieved in Hadoop 1.x?

Is there any possible solution to achieve Namenode HA in Hadoop 1.x ?
Saikumar A
  • 213
  • 1
  • 2
  • 12
0
votes
3 answers

$bin/hadoop namenode --format error

I got this error when I'm trying to execute the this command: $bin/hadoop namenode –format /home/MAHI/hadoop-1.2.1/libexec/../conf/hadoop-env.sh: line 31: unexpected EOF while looking for matching…
Mahi
  • 1
  • 2
0
votes
2 answers

Data in HDFS files not seen under hive table

I have to create a hive table from data present in oracle tables. I'm doing a sqoop, thereby converting the oracle data into HDFS files. Then I'm creating a hive table on the HDFS files. The sqoop completes successfully and the files also get…
Jonathan
  • 144
  • 4
  • 13
0
votes
0 answers

Installing Hadoop in Pseudo Distributed Mode

I'm newbie to Hadoop and today I'm trying to install pseudo distributed mode. Here is the link I follow: http://www.tutorialspoint.com/hadoop/hadoop_enviornment_setup.htm Everything is fine until I run the command: start-dfs.sh Here is what…
lenhhoxung
  • 2,530
  • 2
  • 30
  • 61
0
votes
1 answer

Filename as columns - hadoop

I have log files with that contain the date and hour in the file name. Is there a way to extract date & hour from the filename to add extra columns in hive, an example of the file is weblogs-20150101-010000.gz. The method that I know is to…
macha
  • 7,337
  • 19
  • 62
  • 84
0
votes
2 answers

Why hbase even though hdfs is present

Why is hadoop using hbase even though hdfs is available for storage? We can also store table data as blocks in hdfs. Is the data stored in hbase? If so, then role will hdfs serve?
user4444053
0
votes
1 answer

hadoop - Adding drives to existing cluster

I have a 4 node hadoop cluster set up , I am adding 3 more drives to each node to my cluster . I mounted my 3 drives in one of the nodes(master)and I added a property dfs.data.dir.If I do this , my datanode is not starting . Should I make changes to…
vv2190
  • 11
  • 4
0
votes
0 answers

Custom Partitioning gives ArrayIndexOuntOfBounds Error

When I run my code, I get the following exception: hadoop@hadoop:~/testPrograms$ hadoop jar cp.jar CustomPartition /test/test.txt /test/output33 15/03/03 16:33:33 INFO Configuration.deprecation: session.id is deprecated. Instead, use…
0
votes
0 answers

Hadoop job name is not reflected in JobTrcker console

I have mentioned the job name in driver class as " job = new Job(conf, "Partitioning Even Odd Numbers"); " . And i have changed the job name " custom job" . But the job name is not reflected when run mapreduce program in jobtracker console.
RKCY
  • 4,095
  • 14
  • 61
  • 97
0
votes
2 answers

Query partition with calculation and avoid full table scan

I am an analyst trying to build a query to pull data of last 7 days from a table in Hadoop. The table itself is partitioned by date. When I test my query with hard-coded dates, everything works as expected. However, when I write it to calculate…
eyy
  • 1
0
votes
1 answer

Hadoop TotalOrderPartitioner

I am trying to use total order partioner in hadoop with following code: job.setNumReduceTasks(4); Path partitionFile = new Path(args[1]); InputSampler.Sampler sampler = new InputSampler.RandomSampler(0.1,3,1) …
Sarang Shinde
  • 717
  • 3
  • 7
  • 24