Questions tagged [hadoop2]

Hadoop 2 represents the second generation of the popular open source distributed platform Apache Hadoop.

Apache Hadoop 2.x consists of significant improvements over the previous stable release of Hadoop aka Hadoop 1.x. Several major enhancements have been made to both the building blocks of Hadoop viz, HDFS and MapReduce. They are :

HDFS Federation : In order to scale the name service horizontally, federation uses multiple independent Namenodes/Namespaces.
MapReduce NextGen aka YARN aka MRv2 : The new architecture divides the two major functions of the JobTracker, resource management and job life-cycle management, into separate components. The new ResourceManager manages the global assignment of compute resources to applications and the per-application ApplicationMaster manages the application‚ scheduling and coordination. An application is either a single job in the sense of classic MapReduce jobs or a DAG of such jobs. The ResourceManager and per-machine NodeManager daemon, which manages the user processes on that machine, form the computation fabric.

For more info on Hadoop 2 the official Hadoop 2 homepage can be visited.

2047 questions

votes

5 answers

Hadoop fs -du-h sorting by size for M, G, T, P, E, Z, Y

I am running this command -- sudo -u hdfs hadoop fs -du -h /user | sort -nr and the output is not sorted in terms of gigs, Terabytes,gb I found this command - hdfs dfs -du -s /foo/bar/*tobedeleted | sort -r -k 1 -g | awk '{ suffix="KMGT";…

asked Jun 28 '16 at 21:34

Mayur Narang

votes

2 answers

Importing CSV file into Hadoop

I am new with Hadoop, I have a file to import into hadoop via command line (I access the machine through SSH) How can I import the file in hadoop? How can I check afterward (command)?

csv hadoop2

asked Dec 14 '15 at 21:49

akaliza

3,641
6
24
31

votes

3 answers

Could not find or load main class com.sun.tools.javac.Main hadoop mapreduce

I am trying to learn MapReduce but I am a little lost right now. http://hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Usage Particularly this set of instructions: Compile WordCount.java and…

java hadoop mapreduce hadoop2

asked Mar 25 '15 at 16:13

Liondancer

15,721
51
149
255

votes

1 answer

namespace image and edit log

From the book "Hadoop The Definitive Guide", under the topic Namenodes and Datanodes it is mentioned that: The namenode manages the filesystem namespace. It maintains the filesystem tree and the metadata for all the files and directories in the…

hadoop hdfs hadoop2

asked Nov 15 '14 at 06:16

user4221591

2,084
7
34
68

votes

1 answer

Should Hadoop FileSystem be closed?

I'm building a spring-boot powered service that writes data to Hadoop using filesystem API. Some data is written to parquet file and large blocks are cached in memory so when the service is shut down, potentially several hundred Mb of data have to…

java spring-boot hadoop hdfs hadoop2

asked Mar 14 '19 at 17:41

epsylon

votes

2 answers

Working with input splits(HADOOP)

I have a .txt file as follows: This is xyz This is my home This is my PC This is my room This is ubuntu PC xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxxxxxxxxxxxxxxxxxxx (ignoring the blank line after each record) I have set the…

hadoop mapreduce hadoop2

asked Mar 16 '17 at 18:44

User9523

votes

5 answers

Where is the classpath set for hadoop

Where is the classpath for hadoop set? When I run the below command it gives me the classpath. Where is the classpath set? bin/hadoop classpath I'm using hadoop 2.6.0

hadoop mapreduce hadoop2

asked Feb 01 '15 at 07:53

Bourne

1,905
13
35
53

votes

5 answers

Difference between a ring buffer and a queue

What is the difference between the ring (circular) buffer and a queue? Both support FIFO so in what scenarios I should use ring buffer over a queue and why? Relevance to Hadoop The map phase uses ring buffer to store intermediate key value pairs.…

hadoop data-structures hadoop2

asked Apr 16 '14 at 13:53

Aravind Yarram

78,777
46
231
327

votes

2 answers

Spark/Yarn: File does not exist on HDFS

I have a Hadoop/Yarn cluster setup on AWS, I have one master and 3 slaves. I have verified I have 3 live nodes running on port 50070 and 8088. I tested a spark job in client deploy-mode, everything works fine. When I try to spark-submit a job using…

hadoop apache-spark pyspark hadoop-yarn hadoop2

asked May 28 '17 at 19:36

user1187968

7,154
16
81
152

votes

5 answers

Can Apache YARN be used without HDFS?

I want to use Apache YARN as a cluster and resource manager for running a framework where resources would be shared across different task of the same framework. I want to use my own distributed off-heap file system. Is it possible to use any other…

apache hadoop hadoop-yarn hadoop2

asked Mar 02 '17 at 08:06

Amar Gajbhiye

votes

2 answers

Namenode high availability client request

Can anyone please tell me that If I am using java application to request some file upload/download operations to HDFS with Namenode HA setup, Where this request go first? I mean how would the client know that which namenode is active? It would be…

hadoop hdfs hadoop2 webhdfs

asked Mar 10 '16 at 08:29

user2846382

votes

1 answer

could only be replicated to 0 nodes instead of minReplication (=1). There are 4 datanode(s) running and no node(s) are excluded in this operation

I don't know how to fix this error: Vertex failed, vertexName=initialmap, vertexId=vertex_1449805139484_0001_1_00, diagnostics=[Task failed, taskId=task_1449805139484_0001_1_00_000003, diagnostics=[AttemptID:attempt_1449805139484_0001_1_00_000003_0…

hadoop hdfs hadoop-yarn hadoop2 apache-tez

asked Dec 12 '15 at 22:20

Mona Jalal

34,860
64
239
408

votes

2 answers

How to optimize shuffling/sorting phase in a hadoop job

I'm doing some data preparation using a single node hadoop job. The mapper/combiner in my job outputs many keys (more than 5M or 6M) and obviously the job proceeds slowly or even fails. The mapping phase runs up to 120 mapper and there is just one…

hadoop mapreduce hadoop2

asked Dec 09 '15 at 18:43

HHH

6,085
20
92
164

votes

4 answers

Is there the equivalent for a `find` command in `hadoop`?

I know that from the terminal, one can do a find command to find files such as : find . -type d -name "*something*" -maxdepth 4 But, when I am in the hadoop file system, I have not found a way to do this. hadoop fs -find .... throws an error. How…

hadoop terminal hdfs hadoop2

asked Oct 01 '15 at 20:34

makansij

9,303
37
105
183

votes

2 answers

Hadoop 2.0 Name Node, Secondary Node and Checkpoint node for High Availability

After reading Apache Hadoop documentation , there is a small confusion in understanding responsibilities of secondary node & check point node I am clear on Namenode role and responsibilities: The NameNode stores modifications to the file system…

hadoop hdfs hadoop2 high-availability

asked Aug 17 '15 at 13:12

Ravindra babu

37,698
11
250
211

Prev 1 2

…

99 100 Next