Questions tagged [hadoop]

Hadoop is an open-source solution for providing a distributed/replicated file system, a produciton grade map-reduce system, and has a series of complementary additions like Hive, Pig, and HBase to get more out of a Hadoop-powered cluster.

Hadoop is an Apache foundation sponsored project, with commercial support provided by multiple vendors, including Cloudera, Hortonworks, and MapR. Apache has a more complete set of commercial solutions documented.

Available complementary additions to Hadoop include:

Hadoop distributed filesystem ( standard )
The map-reduce architecture ( standard )
Hive, which provides a SQL like interface to the M/R arch
Hbase, a distributed key-value service

Recommended reference sources:

Hive Language Reference

261 questions

vote

1 answer

MapReduce job is hung after 1 of 5 reducers completed on single-node environment

I have only one Data Node on my dev environment on EC2. I ran heavy MR job and in 6 hours noticed that 100% of mappers and 20% of reducers finished (1 of reducer shows 100% competition, other ones - 0%). Looks like job is hung between 2 reducer…

hadoop mapreduce

asked Nov 09 '12 at 17:21

Marboni

vote

1 answer

Java process failure (hadoop, hbase)

Anytime I am running hadoop/hbase process from a command prompt I get an error: /usr/local/hadoop/bin/hadoop: line 320: /usr/lib/jvm/jdk1.7.0/bin/java: cannot execute binary file /usr/local/hadoop/bin/hadoop: line 390:…

hadoop java hbase

asked Oct 24 '12 at 16:13

Vladimir

vote

1 answer

hadoop: port appears open locally but not remotelly

I am new to linux and hadoop and I am having the same issue as in this question. I think I understand what is causing it but I don't know how to solve it (Don't know what they mean by "Edit the Hadoop server's configuration file so that it includes…

linux linux-networking hadoop

asked Oct 01 '12 at 23:47

miguel

vote

1 answer

Programmatically configure Hadoop cluster from python

Are there any python API's to configure a hadoop cluster (namenode, jobtracker, etc..)setup on an OpenStack cloud?? I have the IP addresss to the VM's and am looking for openstack api's to configure the same.

hadoop openstack

asked Oct 01 '12 at 02:44

StuckAgain

vote

1 answer

Data lost after Hdfs client was killed

I wrote a simple tool to upload logs to HDFS. And I found some curious phenomenon. If I run the tool in foreground and close it with "Ctrl - C", there will be some data in HDFS. If I run the tool in background and kill the process with "kill -KILL…

java linux hadoop hdfs

asked Sep 25 '12 at 03:07

Evans Y.

vote

1 answer

Implications of Multiple JobTracker nodes in a Hadoop cluster?

I get the impression that one can, potentially, have multiple JobTracker nodes configured to share the same set of MR (TaskTracker) nodes. I know that, conventionally, all the nodes in a Hadoop cluster should have the same set of configuration…

configuration hadoop cdh4

asked Aug 28 '12 at 18:33

Jim Dennis

vote

2 answers

Deploy Hadoop to Openstack

I'd like to deploy Hadoop to Openstack cloud. Is there any automatic way to do that? Anyone tried to do that? I'm looking for some devops like juju. I've never used juju and right now I'm going through juju's tutorial about deploying, but most…

deployment hadoop openstack

asked Jul 19 '12 at 12:43

Simon

vote

0 answers

Hadoop commands are taking a very long time to return

I am logged in (via SSH) to the NameNode of my Hadoop cluster; the problem I am having is that any hadoop fs commands, even simple ones like hadoop fs -ls are completed quickly, but take many minutes to return control of the shell to the user. For…

linux centos hadoop hdfs

asked Jul 06 '12 at 23:12

ILikeFood

vote

0 answers

Increase CDH4 diskspace

Initially i was using bundled CDH3 VM (Cloudera Version of Hadoop) later, i removed CDH3 and now I am using CDH4 on CentOS as a VM (through VMWare player) with host machine as Win7 64bit. I am in the need of increasing the diskspace (Since CDH3 VMDK…

centos vmware-player hadoop

asked Jun 23 '12 at 10:31

Logan

vote

4 answers

Hadoop slave nodes not connecting

I have been trying to set up a Hadoop cluster; I managed to get it running in pseudo-distributed mode, and my one machine wordcounted Tolstoy's War and Peace in about thirty seconds. I am now trying to add a second machine to my cluster; To help set…

hadoop

asked May 25 '12 at 16:44

ILikeFood

vote

1 answer

Hadoop on Ubuntu - two different install directories?

I recently installed Hadoop 1.0.3 from the .deb provided by Apache. The package installed correctly, but there seem to be two directories that have Hadoop-related files: /usr/share/hadoop has jars and the site configuration files, while /etc/hadoop…

ubuntu hadoop

asked May 22 '12 at 18:42

ILikeFood

vote

1 answer

What version of HDFS is compatible with HBase stable?

HBase stable is currently hbase-0.90.4, what version(s) of HDFS is it compatible with?

hadoop hdfs hbase

asked Dec 21 '11 at 02:44

Aleksandr Levchuk

2,465
3
22
41

vote

1 answer

scribe log analysis

I have decided to use scribe to log all the error and request details in my site for analysis. How can I use the scribe log data to analyze the data. Is there any tool for this or scribe server programs? I am using PHP as my scripting language

php logging hadoop hbase scribe

asked Nov 19 '11 at 09:32

Vinesh EG

vote

2 answers

hadoop - datanode decommission

I want to remove nodes from my cluster gracefully. I added the following to my hadoop-site.xml: dfs.hosts.exclude /etc/hadoop/conf.dist/dfs.hosts.exclude true I'm adding a…

java cluster hadoop

asked Jun 28 '09 at 17:26

mik

vote

1 answer

JAVA_HOME for Hadoop

I want to configure hadoop to run in pseudo-distributed mode. My configuration files: core-site.xml: fs.default.name hdfs://localhost/ …

hadoop

asked Oct 15 '11 at 18:04

Majid Azimi

Prev 1 2 3

…

17 18 Next