Questions tagged [hadoop2]

Hadoop 2 represents the second generation of the popular open source distributed platform Apache Hadoop.

Apache Hadoop 2.x consists of significant improvements over the previous stable release of Hadoop aka Hadoop 1.x. Several major enhancements have been made to both the building blocks of Hadoop viz, HDFS and MapReduce. They are :

  1. HDFS Federation : In order to scale the name service horizontally, federation uses multiple independent Namenodes/Namespaces.

  2. MapReduce NextGen aka YARN aka MRv2 : The new architecture divides the two major functions of the JobTracker, resource management and job life-cycle management, into separate components. The new ResourceManager manages the global assignment of compute resources to applications and the per-application ApplicationMaster manages the application‚ scheduling and coordination. An application is either a single job in the sense of classic MapReduce jobs or a DAG of such jobs. The ResourceManager and per-machine NodeManager daemon, which manages the user processes on that machine, form the computation fabric.

For more info on Hadoop 2 the official Hadoop 2 homepage can be visited.

2047 questions
0
votes
0 answers

Hadoop : DataNode change directory not taking effect

We are using hadoop 2.7.3 changed the hdfs-site.xml to point to new directory provided permissions on new directory too ...and ran start-dfs.sh and stop-dfs.sh ..on name node ...but changes are not taking effect it still points to the old directory…
user2359997
  • 561
  • 1
  • 16
  • 40
0
votes
1 answer

Hadoop : HDFS Cluster running out of space even though space is available

We have 4 datanode HDFS cluster ...there is large amount of space available on each data node of about 98gb ...but when i look at the datanode information .. it's only using about 10gb and running out of space ... How can we make it use all the…
user2359997
  • 561
  • 1
  • 16
  • 40
0
votes
1 answer

SQOOP:customize input data before exporting into postgress DB

I want to export input data from hdfs into postgress db through sqoop. I am able to achieve this when my input data is in proper format with postgress table. But i want to perform some operation on my input data before exporting it into db like…
Ranjan Swain
  • 75
  • 1
  • 9
0
votes
1 answer

How to increase the space for dfs on HDFS cluster

We have 4 datanode HDFS cluster ...there is large amount of space avialable on each data node of about 98gb ...but when i look at the datanode information .. it's only using about 10gb ... How can we make it use all the 98gb and not run out of…
user2359997
  • 561
  • 1
  • 16
  • 40
0
votes
0 answers

Console logs not getting printed via org.apache.hadoop.mapreduce.Job's waitForCompletion(true) method

As the documentation for waitForCompletion(Boolean xxx) method states that The waitForCompletion() method on Job submits the job and waits for it to finish. The single argument to the method is a flag indicating whether verbose output is generated.…
KayV
  • 12,987
  • 11
  • 98
  • 148
0
votes
1 answer

Error Mapping HDFS files to an external drive

I want to make a folder in hadoop-2.7.3 that physically resides on an external (usb-thumb) drive, the idea being that any file that I -copyFromLocal, will reside on the thumb drive. Similarly any output files from hadoop also goes to the external…
ben
  • 473
  • 2
  • 9
  • 21
0
votes
2 answers

Get yarn applicationId from a submitted mapreduce job

I need to be able to get the yarn applicationId from a mapreduce job. I can't find any API to do that. An Example of my mapreduce job: Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word…
Eqbal
  • 4,722
  • 12
  • 38
  • 47
0
votes
2 answers

Advantages of Hadoop in combination to any database

There are so many different databases. relational databases nosql databases key/value document store wide columns store graph databases And database technologies in-memory column oriented All have their advantages and disadvantages. For…
Khan
  • 1,418
  • 1
  • 25
  • 49
0
votes
1 answer

How to install an application in Hadoop Cluster using YARN?

I am trying to learn YARN. But I have hit a roadblock. I have some questions. For every application, the data nodes must have a container each. But, are these containers created on their own, while running an application or do we need to…
RV186
  • 303
  • 2
  • 3
  • 12
0
votes
1 answer

Can I insure that new Hadoop task will resume at point in inputfile where failed task left off?

I am running Hadoop 2.7.2. Let us say that 10 Hadoop tasks are runnning, and that each task is processing 1 HDFS input text file. Let's say one of the tasks fails, say while reading line 566 of HDFS input file file05. What happens by default? Will…
Ben Weaver
  • 960
  • 1
  • 8
  • 18
0
votes
0 answers

not able to run solr search example in hue 3.9

enter image description hereI have installed hue-3.9 and solr 5.5.3 in linux rhel. But I am not able to run solr search example in hue. When I clicked on install search in examples, it showing error like 'twitter_demo' is not available due to init…
0
votes
0 answers

How to kill running Job from Mapper in Hadoop 2 (wanted: reference to running Job object)

Several stackoverflow entries have addressed this question but none quite seem to nail it. I want logic whereby even if one task on one node fails, I kill the entire job before it finishes. A good strategy seems to be to get a reference to the…
Ben Weaver
  • 960
  • 1
  • 8
  • 18
0
votes
1 answer

Google Bigquery: Spark - Incompatible table partitioning specification

While submitting a copy job from a temporary table that isn't partitioned to the final table that has partition by day, I recieve cause:java.io.IOException: ErrorMessage: Incompatible table partitioning specification. Expects partitioning…
Sam Elamin
  • 245
  • 1
  • 8
0
votes
1 answer

Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable

hive> select * from tweets_text ORDER BY created_time ASC LIMIT 10; URL: http://standbynamenode-zat6kzjl.canopy.com:8088/taskdetails.jsp?jobid=job_1483098353987_0020&tipid=task_1483098353987_0020_m_000000 Diagnostic Messages for this Task: Error:…
The Joker
  • 204
  • 1
  • 5
  • 22
0
votes
3 answers

Run SparkR | or R package on my Cloudera 5.9 Spark

I have 3 node cluster having Cloudera 5.9 running on CentOS 6.7. I need to connect my R packages (running on my Laptop) to the Spark runing in cluster mode on Hadoop. However If I try to connect the local R through Sparklyr Connect to Hadoop Spark…
TextShilpa
  • 21
  • 5
1 2 3
99
100