Questions tagged [hadoop2]

Hadoop 2 represents the second generation of the popular open source distributed platform Apache Hadoop.

Apache Hadoop 2.x consists of significant improvements over the previous stable release of Hadoop aka Hadoop 1.x. Several major enhancements have been made to both the building blocks of Hadoop viz, HDFS and MapReduce. They are :

HDFS Federation : In order to scale the name service horizontally, federation uses multiple independent Namenodes/Namespaces.
MapReduce NextGen aka YARN aka MRv2 : The new architecture divides the two major functions of the JobTracker, resource management and job life-cycle management, into separate components. The new ResourceManager manages the global assignment of compute resources to applications and the per-application ApplicationMaster manages the application‚ scheduling and coordination. An application is either a single job in the sense of classic MapReduce jobs or a DAG of such jobs. The ResourceManager and per-machine NodeManager daemon, which manages the user processes on that machine, form the computation fabric.

For more info on Hadoop 2 the official Hadoop 2 homepage can be visited.

2047 questions

votes

0 answers

Hadoop : DataNode change directory not taking effect

We are using hadoop 2.7.3 changed the hdfs-site.xml to point to new directory provided permissions on new directory too ...and ran start-dfs.sh and stop-dfs.sh ..on name node ...but changes are not taking effect it still points to the old directory…

hadoop hadoop-yarn hadoop2

asked Jan 18 '17 at 18:12

user2359997

votes

1 answer

Hadoop : HDFS Cluster running out of space even though space is available

We have 4 datanode HDFS cluster ...there is large amount of space available on each data node of about 98gb ...but when i look at the datanode information .. it's only using about 10gb and running out of space ... How can we make it use all the…

hadoop hadoop-yarn hadoop2

asked Jan 18 '17 at 15:15

user2359997

votes

1 answer

SQOOP:customize input data before exporting into postgress DB

I want to export input data from hdfs into postgress db through sqoop. I am able to achieve this when my input data is in proper format with postgress table. But i want to perform some operation on my input data before exporting it into db like…

hive hadoop2 sqoop sqoop2

asked Jan 18 '17 at 09:16

Ranjan Swain

votes

1 answer

How to increase the space for dfs on HDFS cluster

We have 4 datanode HDFS cluster ...there is large amount of space avialable on each data node of about 98gb ...but when i look at the datanode information .. it's only using about 10gb ... How can we make it use all the 98gb and not run out of…

hadoop hadoop-yarn hadoop2

asked Jan 17 '17 at 22:57

user2359997

votes

0 answers

Console logs not getting printed via org.apache.hadoop.mapreduce.Job's waitForCompletion(true) method

As the documentation for waitForCompletion(Boolean xxx) method states that The waitForCompletion() method on Job submits the job and waits for it to finish. The single argument to the method is a flag indicating whether verbose output is generated.…

java hadoop java-8 mapreduce hadoop2

asked Jan 17 '17 at 12:47

KayV

12,987
11
98
148

votes

1 answer

Error Mapping HDFS files to an external drive

I want to make a folder in hadoop-2.7.3 that physically resides on an external (usb-thumb) drive, the idea being that any file that I -copyFromLocal, will reside on the thumb drive. Similarly any output files from hadoop also goes to the external…

hadoop hadoop2

asked Jan 13 '17 at 04:40

ben

votes

2 answers

Get yarn applicationId from a submitted mapreduce job

I need to be able to get the yarn applicationId from a mapreduce job. I can't find any API to do that. An Example of my mapreduce job: Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word…

mapreduce hadoop-yarn hadoop2

asked Jan 13 '17 at 03:15

Eqbal

4,722
12
38
47

votes

2 answers

Advantages of Hadoop in combination to any database

There are so many different databases. relational databases nosql databases key/value document store wide columns store graph databases And database technologies in-memory column oriented All have their advantages and disadvantages. For…

database hadoop2 bigdata nosql

asked Jan 12 '17 at 14:58

Khan

1,418
1
25
49

votes

1 answer

How to install an application in Hadoop Cluster using YARN?

I am trying to learn YARN. But I have hit a roadblock. I have some questions. For every application, the data nodes must have a container each. But, are these containers created on their own, while running an application or do we need to…

docker containers hadoop-yarn hadoop2 cgroups

asked Jan 12 '17 at 10:03

RV186

votes

1 answer

Can I insure that new Hadoop task will resume at point in inputfile where failed task left off?

I am running Hadoop 2.7.2. Let us say that 10 Hadoop tasks are runnning, and that each task is processing 1 HDFS input text file. Let's say one of the tasks fails, say while reading line 566 of HDFS input file file05. What happens by default? Will…

hadoop parallel-processing hadoop2

asked Jan 11 '17 at 14:40

Ben Weaver

votes

0 answers

not able to run solr search example in hue 3.9

enter image description hereI have installed hue-3.9 and solr 5.5.3 in linux rhel. But I am not able to run solr search example in hue. When I clicked on install search in examples, it showing error like 'twitter_demo' is not available due to init…

hadoop solr hadoop2 hue

asked Jan 10 '17 at 14:53

Ashish Tyagi

votes

0 answers

How to kill running Job from Mapper in Hadoop 2 (wanted: reference to running Job object)

Several stackoverflow entries have addressed this question but none quite seem to nail it. I want logic whereby even if one task on one node fails, I kill the entire job before it finishes. A good strategy seems to be to get a reference to the…

java hadoop hadoop-yarn hadoop2

asked Jan 09 '17 at 20:36

Ben Weaver

votes

1 answer

Google Bigquery: Spark - Incompatible table partitioning specification

While submitting a copy job from a temporary table that isn't partitioned to the final table that has partition by day, I recieve cause:java.io.IOException: ErrorMessage: Incompatible table partitioning specification. Expects partitioning…

apache-spark google-bigquery hadoop2

asked Jan 09 '17 at 11:42

Sam Elamin

votes

1 answer

Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable

hive> select * from tweets_text ORDER BY created_time ASC LIMIT 10; URL: http://standbynamenode-zat6kzjl.canopy.com:8088/taskdetails.jsp?jobid=job_1483098353987_0020&tipid=task_1483098353987_0020_m_000000 Diagnostic Messages for this Task: Error:…

hadoop hive hdfs hadoop2

asked Jan 05 '17 at 14:02

The Joker

votes

3 answers

Run SparkR | or R package on my Cloudera 5.9 Spark

I have 3 node cluster having Cloudera 5.9 running on CentOS 6.7. I need to connect my R packages (running on my Laptop) to the Spark runing in cluster mode on Hadoop. However If I try to connect the local R through Sparklyr Connect to Hadoop Spark…

r apache-spark hadoop2 sparkr

asked Jan 03 '17 at 18:54

TextShilpa

Prev 1 2 3

…

100