Questions tagged [hadoop]

Hadoop is an open-source solution for providing a distributed/replicated file system, a produciton grade map-reduce system, and has a series of complementary additions like Hive, Pig, and HBase to get more out of a Hadoop-powered cluster.

Hadoop is an Apache foundation sponsored project, with commercial support provided by multiple vendors, including Cloudera, Hortonworks, and MapR. Apache has a more complete set of commercial solutions documented.

Available complementary additions to Hadoop include:

  • Hadoop distributed filesystem ( standard )
  • The map-reduce architecture ( standard )
  • Hive, which provides a SQL like interface to the M/R arch
  • Hbase, a distributed key-value service

Recommended reference sources:

261 questions
5
votes
1 answer

Hadoop - Name Node and Data Node on the same machine

We have 7 identical physical servers (2x8 core CPU, 128GB RAM, 8x 6TB disks) that will be used for Hadoop. All of the machines are connected to 10G switch with double 10G interfaces. Since we do not have many machines we want to use one of the…
5
votes
1 answer

Forward-sync to HDFS? (OR continue an incomplete hdfs upload?)

Anyone have a good suggestion for doing a forward sync to HDFS? ("forward-sync" in contrast to "bi-directional sync") Basically I have a large number of files I want to put into the HDFS. Its so large that I'll often, say, lose connectivity before…
Nate Murray
  • 993
  • 1
  • 7
  • 7
5
votes
2 answers

How to fix Hadoop HDFS cluster with missing blocks after one node was reinstalled?

I have a 5 slave Hadoop cluster (using CDH4)---slaves are where DataNode and TaskNode run. Each slave has 4 partitions dedicated to HDFS storage. One of the slaves needed a reinstall and this caused one of the HDFS partitions to be lost. At this…
Dolan Antenucci
  • 329
  • 1
  • 4
  • 16
4
votes
1 answer

mount.nfs: mount system call failed

I am trying to mount hdfs on my local machine running Ubuntu using the following command :--- sudo mount -t nfs -o vers=3,proto=tcp,nolock 192.168.170.52:/ /mnt/hdfs_mount/ But I am getting this error:- mount.nfs: mount system call failed Output…
Bhavya Jain
  • 141
  • 1
  • 1
  • 4
4
votes
1 answer

Possible to ssh into a server without using -i flag for key?

I have 3 EC2 instances and they all use the same private key. I'm setting up a hadoop cluster between these nodes and they require passwordless entry for this to work. How can I use this private key to easily ssh into the servers with keyless entry?…
coderkid
  • 193
  • 1
  • 5
4
votes
2 answers

Execute shell script as one of the steps on EMR AWS

We are thinking to migrate our Hadoop infrastructure from Data Center to AWS EMR. As some of the tasks / stages in ETL process are dependent e.g. flow is like Map Reduce job will generate data Shell script will move the data generated in step 1 to…
Free Coder
  • 41
  • 1
  • 1
  • 4
4
votes
1 answer

How to connect two docker containers running on the same host?

I have two docker containers running docker ps results CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 0bfd25abbfc6 f_service:latest "/usr/local/start-fl 13 seconds ago …
Gibbs
  • 137
  • 1
  • 7
4
votes
3 answers

Is there a way to grep gzipped content in hdfs without extracting it?

I'm looking for a way to zgrep hdfs files something like: hadoop fs -zcat hdfs://myfile.gz | grep "hi" or hadoop fs -cat hdfs://myfile.gz | zgrep "hi" it does not really work for me is there anyway to achieve that with command line?
Jas
  • 701
  • 4
  • 13
  • 23
4
votes
2 answers

hadoop-config.sh in bin/ and libexec/

While setting up hadoop, I found that hadoop-config.sh script is present in two directories, bin/ and libexec/. Both the files are identical. While looking onto scripts, I found that if hadoop-config.sh is present in libexec, then it gets executed.…
krackoder
  • 151
  • 1
  • 4
4
votes
1 answer

Does Cloudera Manager need ongoing Root Access?

When installing Cloudera Manager 4, it asks for the root password on a passwordless sudo user to install packages. Does this account need to be retained, or is it just for initial setup?
Kyle Brandt
  • 83,619
  • 74
  • 305
  • 448
4
votes
1 answer

Can't connect to HDFS in pseudo-distributed mode

I followed the instructions here for installing hadoop in pseudo-distributed mode. However, I'm having trouble connecting to HDFS. When I execute this command : ./hadoop fs -ls / I get a directory listing just like I should. However, when I execute…
sangfroid
  • 193
  • 1
  • 3
  • 10
4
votes
3 answers

What is meant by "streaming data access" in HDFS?

According to the HDFS Architecture page HDFS was designed for "streaming data access". I'm not sure what that means exactly, but would guess it means an operation like seek is either disabled or has sub-optimal performance. Would this be…
Van Gale
  • 472
  • 1
  • 5
  • 10
4
votes
3 answers

PXE boot Linux - which directories must be writable?

I'm planning to set up a small Hadoop cluster where the slave nodes boot and run from a central PXE server, to simplify deployment and updates, and to enable all of the disks on the slaves to be (almost) monopolized by HDFS. However, I suppose I'll…
Andrew Clegg
  • 387
  • 1
  • 2
  • 9
4
votes
1 answer

Hadoop: Blacklisted tasktracker

I am running a Hadoop job (using Hadoop 0.20.2) on a 6 machine setup; one machine is the namenode / secondary node / job tracker (master) and the other 5 machines are all datanodes / tasktrackers (slaves). The job has over 14,000 maps and it is…
RobertoP
  • 143
  • 1
  • 3
4
votes
4 answers

Hadoop cluster. 2 Fast, 4 Medium, 8 slower machines?

We're going to purchase some new hardware to use just for a Hadoop cluster and we're stuck on what we should purchase. Say we have a budget of $5k should we buy two super nice machines at $2500/each, four at around $1200/each or eight at around $600…
Ryan Detzel
  • 707
  • 3
  • 7
  • 21
1
2
3
17 18