Questions tagged [hadoop]

Hadoop is an open-source solution for providing a distributed/replicated file system, a produciton grade map-reduce system, and has a series of complementary additions like Hive, Pig, and HBase to get more out of a Hadoop-powered cluster.

Hadoop is an Apache foundation sponsored project, with commercial support provided by multiple vendors, including Cloudera, Hortonworks, and MapR. Apache has a more complete set of commercial solutions documented.

Available complementary additions to Hadoop include:

  • Hadoop distributed filesystem ( standard )
  • The map-reduce architecture ( standard )
  • Hive, which provides a SQL like interface to the M/R arch
  • Hbase, a distributed key-value service

Recommended reference sources:

261 questions
1
vote
1 answer

MapReduce job is hung after 1 of 5 reducers completed on single-node environment

I have only one Data Node on my dev environment on EC2. I ran heavy MR job and in 6 hours noticed that 100% of mappers and 20% of reducers finished (1 of reducer shows 100% competition, other ones - 0%). Looks like job is hung between 2 reducer…
Marboni
  • 111
  • 4
1
vote
1 answer

Java process failure (hadoop, hbase)

Anytime I am running hadoop/hbase process from a command prompt I get an error: /usr/local/hadoop/bin/hadoop: line 320: /usr/lib/jvm/jdk1.7.0/bin/java: cannot execute binary file /usr/local/hadoop/bin/hadoop: line 390:…
Vladimir
  • 75
  • 1
  • 9
1
vote
1 answer

hadoop: port appears open locally but not remotelly

I am new to linux and hadoop and I am having the same issue as in this question. I think I understand what is causing it but I don't know how to solve it (Don't know what they mean by "Edit the Hadoop server's configuration file so that it includes…
miguel
  • 111
  • 2
1
vote
1 answer

Programmatically configure Hadoop cluster from python

Are there any python API's to configure a hadoop cluster (namenode, jobtracker, etc..)setup on an OpenStack cloud?? I have the IP addresss to the VM's and am looking for openstack api's to configure the same.
StuckAgain
  • 21
  • 1
1
vote
1 answer

Data lost after Hdfs client was killed

I wrote a simple tool to upload logs to HDFS. And I found some curious phenomenon. If I run the tool in foreground and close it with "Ctrl - C", there will be some data in HDFS. If I run the tool in background and kill the process with "kill -KILL…
Evans Y.
  • 111
  • 3
1
vote
1 answer

Implications of Multiple JobTracker nodes in a Hadoop cluster?

I get the impression that one can, potentially, have multiple JobTracker nodes configured to share the same set of MR (TaskTracker) nodes. I know that, conventionally, all the nodes in a Hadoop cluster should have the same set of configuration…
Jim Dennis
  • 807
  • 1
  • 10
  • 22
1
vote
2 answers

Deploy Hadoop to Openstack

I'd like to deploy Hadoop to Openstack cloud. Is there any automatic way to do that? Anyone tried to do that? I'm looking for some devops like juju. I've never used juju and right now I'm going through juju's tutorial about deploying, but most…
Simon
  • 213
  • 1
  • 2
  • 4
1
vote
0 answers

Hadoop commands are taking a very long time to return

I am logged in (via SSH) to the NameNode of my Hadoop cluster; the problem I am having is that any hadoop fs commands, even simple ones like hadoop fs -ls are completed quickly, but take many minutes to return control of the shell to the user. For…
ILikeFood
  • 399
  • 1
  • 5
  • 12
1
vote
0 answers

Increase CDH4 diskspace

Initially i was using bundled CDH3 VM (Cloudera Version of Hadoop) later, i removed CDH3 and now I am using CDH4 on CentOS as a VM (through VMWare player) with host machine as Win7 64bit. I am in the need of increasing the diskspace (Since CDH3 VMDK…
Logan
  • 111
  • 2
1
vote
4 answers

Hadoop slave nodes not connecting

I have been trying to set up a Hadoop cluster; I managed to get it running in pseudo-distributed mode, and my one machine wordcounted Tolstoy's War and Peace in about thirty seconds. I am now trying to add a second machine to my cluster; To help set…
ILikeFood
  • 399
  • 1
  • 5
  • 12
1
vote
1 answer

Hadoop on Ubuntu - two different install directories?

I recently installed Hadoop 1.0.3 from the .deb provided by Apache. The package installed correctly, but there seem to be two directories that have Hadoop-related files: /usr/share/hadoop has jars and the site configuration files, while /etc/hadoop…
ILikeFood
  • 399
  • 1
  • 5
  • 12
1
vote
1 answer

What version of HDFS is compatible with HBase stable?

HBase stable is currently hbase-0.90.4, what version(s) of HDFS is it compatible with?
Aleksandr Levchuk
  • 2,465
  • 3
  • 22
  • 41
1
vote
1 answer

scribe log analysis

I have decided to use scribe to log all the error and request details in my site for analysis. How can I use the scribe log data to analyze the data. Is there any tool for this or scribe server programs? I am using PHP as my scripting language
Vinesh EG
  • 111
  • 1
1
vote
2 answers

hadoop - datanode decommission

I want to remove nodes from my cluster gracefully. I added the following to my hadoop-site.xml: dfs.hosts.exclude /etc/hadoop/conf.dist/dfs.hosts.exclude true I'm adding a…
mik
  • 199
  • 2
  • 12
1
vote
1 answer

JAVA_HOME for Hadoop

I want to configure hadoop to run in pseudo-distributed mode. My configuration files: core-site.xml: fs.default.name hdfs://localhost/
Majid Azimi
  • 547
  • 1
  • 13
  • 29