Questions tagged [hadoop]

Hadoop is an open-source solution for providing a distributed/replicated file system, a produciton grade map-reduce system, and has a series of complementary additions like Hive, Pig, and HBase to get more out of a Hadoop-powered cluster.

Hadoop is an Apache foundation sponsored project, with commercial support provided by multiple vendors, including Cloudera, Hortonworks, and MapR. Apache has a more complete set of commercial solutions documented.

Available complementary additions to Hadoop include:

Hadoop distributed filesystem ( standard )
The map-reduce architecture ( standard )
Hive, which provides a SQL like interface to the M/R arch
Hbase, a distributed key-value service

Recommended reference sources:

Hive Language Reference

261 questions

votes

1 answer

zookeeper automatic start operation fails when firewall is stoped at rc.local script

iam using hadoop apache 2.7.1 on centos 7 and my cluster is ha cluster and iam using zookeeper quorum for automatic failover but i want to automate zookeeper start process and ofcourse in the shell script we have to stop firewall first in order to…

centos7 hadoop

asked Jul 11 '17 at 11:33

oula alshiekh

votes

1 answer

copying files in hdfs stalls

Have a 35 node cluster with a high number of blocks in it: ≈450K blocks per data node. After configuration change (which contained rack reassignments and NameNode Xmx increase) HDFS became a problem. It's unable to perform copy operations on random…

hadoop hdfs

asked May 12 '17 at 04:44

inteloid

votes

2 answers

Namenodes fails starting on HA cluster - Fatals exists in Journalnode logs

I am having some problem with my Hadoop Cluster Centos 7.3 Hortonworks Ambari 2.4.2 Hortonworks HDP 2.5.3 Ambari stderr: 2017-04-06 10:49:49,039 - Getting jmx metrics from NN failed. URL:…

hadoop

asked Apr 07 '17 at 08:38

Sedat Kestepe

votes

1 answer

how to install hadoop2.4.1 in windows with spark 2.0.0

i want to setup a cluster using hadoop in yarn mode..i want to use spark API for map-reduce and will use spark submit to deploy my applications..i want to work on cluster..can anyone help me how to install HADOOP in cluster using windows

windows-7 cluster hadoop hdfs apache-spark

asked Mar 13 '17 at 12:26

Sadim Nadeem

votes

0 answers

Scale Up Hadoop disk on aws infrastructure

Our Hadoop cluster disk is getting full. So I want to scale up disk asap, and one way I can think is to increase ebs volume on all three nodes Hadoop 2.6.0-cdh5.5.1 (2 data-node, 1 name-node). So my doubt is, Is there any better way to scale up…

amazon-web-services hadoop cloudera

asked Feb 07 '17 at 12:37

piyushmandovra

votes

2 answers

Try run Hive and have Error:“java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.0.0-alpha1”

I have a version of Hadoop on Ubuntu 16.10 Hadoop 3.0.0-alpha1 Source code repository https://git-wip-us.apache.org/repos/asf/hadoop.git -r a990d2ebcd6de5d7dc2d3684930759b0f0ea4dc3 Compiled by andrew on 2016-08-30T07:02Z Compiled with protoc…

hadoop

asked Dec 17 '16 at 09:25

Nikolay Baranenko

votes

1 answer

why cant I only access a port from localhost?

I am trying to set up elasticsearch on my mapR issued redhat virtual machine. It comes pre-loaded with the mapR ecosystem. I installed elasticsearch via yum. I am able to listen to it from inside the vm: [root@maprdemo elasticsearch]# curl -XGET…

firewall port hadoop

asked Nov 22 '16 at 20:47

user3221132

votes

1 answer

Hadoop datanode - start with one disk and add more later or start with as much disks as possible and fill them equally

I'm wondering the following in regard to the Datanode disks setup in Hadoop cluster. Which of the those two options is better: To add one (or few) disks to the Datanode, and attach more after they start to fill in. Or to start with as many disks as…

hadoop

asked Oct 27 '16 at 15:29

mart

votes

1 answer

Why does DFSZKFailoverController kills Namenode process in hadoop?

I try to configure hadoop high availability cluster by following this tutorial: http://www.edureka.co/blog/how-to-set-up-hadoop-cluster-with-hdfs-high-availability/ When I follow that article I faces with two main problems: 1. hdfs namenode…

high-availability hadoop hdfs zookeeper

asked Jul 17 '16 at 14:46

Oleksandr

votes

1 answer

What should HADOOP_PREFIX be for Accumulo installation?

I'm trying to install Accumulo 1.7.2 using these directions. ./bin/build_native_library.sh seems to succeed, and libaccumulo.so winds up in lib/native/libaccumulo.so in the Accumulo install directory. When I run ./bin/bootstrap_config.sh, I pick…

hadoop

asked Jul 16 '16 at 02:32

Lindsey Kuper

votes

1 answer

Server full ram but not show which process is use much RAM

One of our Hadoop datanode with CentOS7 64 bit is using all of it's RAM, I tried to figure out which process is using so much RAM but couldn't. Please help me to check this please: System: CentOS 7 64bit, with 64 GiB RAM HTOP on server: Htop show…

centos7 hadoop memory-leak

asked Apr 13 '16 at 07:19

ZTE.A

votes

1 answer

Force hadoop to use for S3 connection

I'm trying to upload file to S3 using hadoop: hadoop fs -Dfs.s3a.connection.ssl.enabled=false -Dfs.s3a.proxy.host=127.0.0.1 -Dfs.s3a.proxy.port=8123 -put pig_1421167148680.log s3a://access:secret@bucket/temp/trash But I can't force hadoop to use…

proxy amazon-s3 hadoop

asked Jan 08 '16 at 17:08

smaj

votes

2 answers

Root directory full , can't recover consumed space

I have researched my scenario everywhere but can't find any string related to my issue. I have a datanode in Hadoop Framework , which recently went bad because all the drives on that box got umounted due to some unknown reason. These drives are…

linux hadoop

asked Nov 13 '15 at 16:13

S.Ahmad

votes

1 answer

Hive Server2 not impersonating HDFS

I am trying to secure Hive using storage based security. I am using Kerberos and LDAP. What I am trying to get is Hive to create directories and files as user (and their main group) in HDFS. This way I hope to restrict access to databases based on…

hadoop

asked Aug 28 '15 at 11:34

user16611

votes

1 answer

Why does Accumulo require $ZOOKEEPER_HOME in addition to the IPs of the Zookeeper ensemble?

According to the documentation, Accumulo requires you to set $ZOOKEEPER_HOME (a local path) in the configuration files, and also requires a list of IPs for the Zookeeper ensemble. Why are IPs alone not sufficient? What if your Zookeeper ensemble is…

hadoop zookeeper

asked Jul 30 '15 at 09:21

Ianvdl

Prev 1 2 3

…

17 18 Next