Questions tagged [hadoop]

Hadoop is an open-source solution for providing a distributed/replicated file system, a produciton grade map-reduce system, and has a series of complementary additions like Hive, Pig, and HBase to get more out of a Hadoop-powered cluster.

Hadoop is an Apache foundation sponsored project, with commercial support provided by multiple vendors, including Cloudera, Hortonworks, and MapR. Apache has a more complete set of commercial solutions documented.

Available complementary additions to Hadoop include:

Hadoop distributed filesystem ( standard )
The map-reduce architecture ( standard )
Hive, which provides a SQL like interface to the M/R arch
Hbase, a distributed key-value service

Recommended reference sources:

Hive Language Reference

261 questions

votes

1 answer

Hadoop - Name Node and Data Node on the same machine

We have 7 identical physical servers (2x8 core CPU, 128GB RAM, 8x 6TB disks) that will be used for Hadoop. All of the machines are connected to 10G switch with double 10G interfaces. Since we do not have many machines we want to use one of the…

hadoop

asked Mar 15 '16 at 08:32

Merve Aydınlılar

votes

1 answer

Forward-sync to HDFS? (OR continue an incomplete hdfs upload?)

Anyone have a good suggestion for doing a forward sync to HDFS? ("forward-sync" in contrast to "bi-directional sync") Basically I have a large number of files I want to put into the HDFS. Its so large that I'll often, say, lose connectivity before…

asked Sep 14 '09 at 15:52

Nate Murray

votes

2 answers

How to fix Hadoop HDFS cluster with missing blocks after one node was reinstalled?

I have a 5 slave Hadoop cluster (using CDH4)---slaves are where DataNode and TaskNode run. Each slave has 4 partitions dedicated to HDFS storage. One of the slaves needed a reinstall and this caused one of the HDFS partitions to be lost. At this…

partition hadoop reinstall hdfs

asked Aug 10 '13 at 12:36

Dolan Antenucci

votes

1 answer

mount.nfs: mount system call failed

I am trying to mount hdfs on my local machine running Ubuntu using the following command :--- sudo mount -t nfs -o vers=3,proto=tcp,nolock 192.168.170.52:/ /mnt/hdfs_mount/ But I am getting this error:- mount.nfs: mount system call failed Output…

ubuntu nfs mount hadoop hdfs

asked Jun 28 '17 at 06:23

Bhavya Jain

votes

1 answer

Possible to ssh into a server without using -i flag for key?

I have 3 EC2 instances and they all use the same private key. I'm setting up a hadoop cluster between these nodes and they require passwordless entry for this to work. How can I use this private key to easily ssh into the servers with keyless entry?…

ssh amazon-ec2 ssh-keys hadoop

asked Sep 26 '16 at 17:03

coderkid

votes

2 answers

Execute shell script as one of the steps on EMR AWS

We are thinking to migrate our Hadoop infrastructure from Data Center to AWS EMR. As some of the tasks / stages in ETL process are dependent e.g. flow is like Map Reduce job will generate data Shell script will move the data generated in step 1 to…

amazon-web-services hadoop

asked Feb 18 '16 at 07:36

Free Coder

votes

1 answer

How to connect two docker containers running on the same host?

I have two docker containers running docker ps results CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 0bfd25abbfc6 f_service:latest "/usr/local/start-fl 13 seconds ago …

docker hadoop

asked Jan 29 '15 at 06:16

Gibbs

votes

3 answers

Is there a way to grep gzipped content in hdfs without extracting it?

I'm looking for a way to zgrep hdfs files something like: hadoop fs -zcat hdfs://myfile.gz | grep "hi" or hadoop fs -cat hdfs://myfile.gz | zgrep "hi" it does not really work for me is there anyway to achieve that with command line?

hadoop hdfs

asked Jan 22 '15 at 10:49

Jas

votes

2 answers

hadoop-config.sh in bin/ and libexec/

While setting up hadoop, I found that hadoop-config.sh script is present in two directories, bin/ and libexec/. Both the files are identical. While looking onto scripts, I found that if hadoop-config.sh is present in libexec, then it gets executed.…

hadoop mapreduce

asked Jul 03 '14 at 17:31

krackoder

votes

1 answer

Does Cloudera Manager need ongoing Root Access?

When installing Cloudera Manager 4, it asks for the root password on a passwordless sudo user to install packages. Does this account need to be retained, or is it just for initial setup?

hadoop hbase cloudera cloudera-manager

asked May 29 '14 at 00:50

Kyle Brandt

83,619
74
305
448

votes

1 answer

Can't connect to HDFS in pseudo-distributed mode

I followed the instructions here for installing hadoop in pseudo-distributed mode. However, I'm having trouble connecting to HDFS. When I execute this command : ./hadoop fs -ls / I get a directory listing just like I should. However, when I execute…

linux hadoop hdfs hbase

asked Aug 23 '12 at 22:53

sangfroid

votes

3 answers

What is meant by "streaming data access" in HDFS?

According to the HDFS Architecture page HDFS was designed for "streaming data access". I'm not sure what that means exactly, but would guess it means an operation like seek is either disabled or has sub-optimal performance. Would this be…

filesystems streaming hadoop hdfs

asked Jul 14 '09 at 10:13

Van Gale

votes

3 answers

PXE boot Linux - which directories must be writable?

I'm planning to set up a small Hadoop cluster where the slave nodes boot and run from a central PXE server, to simplify deployment and updates, and to enable all of the disks on the slaves to be (almost) monopolized by HDFS. However, I suppose I'll…

linux pxe-boot hadoop

asked May 10 '12 at 13:06

Andrew Clegg

votes

1 answer

Hadoop: Blacklisted tasktracker

I am running a Hadoop job (using Hadoop 0.20.2) on a 6 machine setup; one machine is the namenode / secondary node / job tracker (master) and the other 5 machines are all datanodes / tasktrackers (slaves). The job has over 14,000 maps and it is…

hadoop

asked Jul 08 '11 at 19:43

RobertoP

votes

4 answers

Hadoop cluster. 2 Fast, 4 Medium, 8 slower machines?

We're going to purchase some new hardware to use just for a Hadoop cluster and we're stuck on what we should purchase. Say we have a budget of $5k should we buy two super nice machines at $2500/each, four at around $1200/each or eight at around $600…

hardware cluster hadoop

asked Jun 17 '09 at 20:56

Ryan Detzel

Prev 1

…

17 18 Next