Questions tagged [hadoop]

Hadoop is an open-source solution for providing a distributed/replicated file system, a produciton grade map-reduce system, and has a series of complementary additions like Hive, Pig, and HBase to get more out of a Hadoop-powered cluster.

Hadoop is an Apache foundation sponsored project, with commercial support provided by multiple vendors, including Cloudera, Hortonworks, and MapR. Apache has a more complete set of commercial solutions documented.

Available complementary additions to Hadoop include:

  • Hadoop distributed filesystem ( standard )
  • The map-reduce architecture ( standard )
  • Hive, which provides a SQL like interface to the M/R arch
  • Hbase, a distributed key-value service

Recommended reference sources:

261 questions
0
votes
2 answers

ext4 on faulty disks. How to avoid remount read-only?

The Problem: I'm in charge for a Hadoop cluster of 44 nodes. We have 1.5TB WD Green Drives with (quite unknown) the Load Cycle Count problem. These disks work fine but as they get older they show an increasing number of bad blocks. Rewriting these…
kei1aeh5quahQu4U
  • 445
  • 5
  • 22
0
votes
1 answer

installing and configuring hadoop on ubuntu

I'm unable to locate the file hadoop-ec2-env.sh. I've downloaded and intalled hadoop_1.0.4-1_x86_64.deb from http://mirror.olnevhost.net/pub/apache/hadoop/common/stable/ This is the most recent stable version. I would like to run hadoop on EC2. I'm…
Alex Gordon
  • 455
  • 3
  • 14
  • 31
0
votes
1 answer

setting up environment variables for aws hadoop ec2

I've been following this book: Hadoop in Action It gives a nice guide on how to start using ec2 with hadoop. One of the first things that it says is to download the command line tools…
Alex Gordon
  • 455
  • 3
  • 14
  • 31
0
votes
1 answer

Hadoop: Slave nodes are not starting

I am trying to setup a Pseudo Distributed Hadoop Cluster on my machine. Env Details : Host OS: Windows Guest OS: Ubuntu Vm's Created one master and one slave. I was able to run the hadoop wordcount successfully on single node cluster But when i…
Rajasimhan
  • 1
  • 1
  • 2
0
votes
1 answer

Getting exposure to big data without having to set up the environment

Is anyone aware of any sandbox where an environment is already set up for big data processing? It can be hadoop, cassandra, pig etc... I'm a sql server programmer, and trying to get into big data/nosql solutions, but having a very difficult time…
Alex Gordon
  • 455
  • 3
  • 14
  • 31
0
votes
1 answer

Hadoop hardware disk choice

We're evaluating the options of setting up a large hadoop cluster. Now we actually have the choice to choose from these 3 setups: 300x server with 12x 1TB disk 150x server with 12x 2TB disk 100x server with 12x 3TB disk The other server…
RobinUS2
  • 131
  • 5
0
votes
2 answers

Kernel panic - not syncing - Attempted to kill init

I am using Linux 2.6.32-33-server #70-Ubuntu 10.04.3 as Data-Nodes and Name-Node in my Hadoop cluster, but one of my data node is down since morning. When i restarted that particular system it showed an error "Kernel panic - not syncing -…
vikash
0
votes
1 answer

Monitoring tools for non-distro Hadoop

I've built a Hadoop cluster by installing most packages manually (using binaries or source). I opted not to use a custom distribution like Cloudera, MapR or Hortonworks, since I wanted the flexibility that comes with choosing what packages and…
sa125
  • 325
  • 1
  • 7
  • 14
0
votes
1 answer

Can't start Hadoop from an init.d script

I'm using CentOS 6.2. I'm trying to start Hadoop from an init.d script, but it's failing. This is what I see in boot.log : Retrigger failed udev events [ OK ] Enabling Bluetooth devices: starting namenode, logging…
sangfroid
  • 193
  • 1
  • 3
  • 10
0
votes
1 answer

Hadoop Configuration Files - Who Needs What?

As I am setting up Hadoop, one question keeps popping in my mind but I can't find the answer. Which Hadoop configuration files need to be copied to which nodes. For example, I'm making changes to the following files: hadoop-env.sh, core-site.xml,…
JasCav
  • 233
  • 1
  • 12
0
votes
1 answer

Hadoop CDH4 Evaluation: Which Ubuntu would be preferred? Lucid or Precise

I'm setting up a CDH4 in AWS for evaluation (we already have a CHD3 running on Ubuntu Lucid) and I'd like advice regarding any known gotchyas that I'd be likely to encounter if running it on Lucid vs. Precise. Is it safer to set up the test cluster…
Jim Dennis
  • 807
  • 1
  • 10
  • 22
0
votes
1 answer

How much power supply do I need for my server, and could a shortage be causing my odd crashing?

I have 5 servers, all with similar hardware (i7, four 2tb 7200rpm drives, two 4tb 5400rpm drives, 430 watt power supply), and lately the machines have been freezing up. This has gotten worse in the last day or so, and I can't pinpoint any…
Dolan Antenucci
  • 329
  • 1
  • 4
  • 16
0
votes
1 answer

Empty homedir name when SSHing to localhost

I am trying to start a system (Hadoop, but that should not matter much for this question), and need to be able to ssh to localhost. I do this on windows with cygwin. The cygwin SSHD service is running, and ssh localhost works (as does ssh…
openbas2
  • 5
  • 3
0
votes
1 answer

Access HDFS HADOOP using APACHE Web Server, Linux CentOS

If I have an apache web server as a directory how I can access to HDFS Cluster for upload and modify files, what is the configuration i want to do? many thanks
blackriderws
  • 137
  • 8
0
votes
2 answers

I am trying to install Hadoop 0.23 but its not getting installed

I am trying to install Hadoop 0.23 on my fedora 13 but its not getting installed.On web every were there is support for hadoop 0.20 installation. I am trying installation from here. Every time i am getting error , Error: JAVA_HOME is not set. I…