Questions tagged [hadoop]

Hadoop is an open-source solution for providing a distributed/replicated file system, a produciton grade map-reduce system, and has a series of complementary additions like Hive, Pig, and HBase to get more out of a Hadoop-powered cluster.

Hadoop is an Apache foundation sponsored project, with commercial support provided by multiple vendors, including Cloudera, Hortonworks, and MapR. Apache has a more complete set of commercial solutions documented.

Available complementary additions to Hadoop include:

  • Hadoop distributed filesystem ( standard )
  • The map-reduce architecture ( standard )
  • Hive, which provides a SQL like interface to the M/R arch
  • Hbase, a distributed key-value service

Recommended reference sources:

261 questions
0
votes
1 answer

zookeeper automatic start operation fails when firewall is stoped at rc.local script

iam using hadoop apache 2.7.1 on centos 7 and my cluster is ha cluster and iam using zookeeper quorum for automatic failover but i want to automate zookeeper start process and ofcourse in the shell script we have to stop firewall first in order to…
oula alshiekh
  • 103
  • 1
  • 2
  • 6
0
votes
1 answer

copying files in hdfs stalls

Have a 35 node cluster with a high number of blocks in it: ≈450K blocks per data node. After configuration change (which contained rack reassignments and NameNode Xmx increase) HDFS became a problem. It's unable to perform copy operations on random…
inteloid
  • 101
  • 2
0
votes
2 answers

Namenodes fails starting on HA cluster - Fatals exists in Journalnode logs

I am having some problem with my Hadoop Cluster Centos 7.3 Hortonworks Ambari 2.4.2 Hortonworks HDP 2.5.3 Ambari stderr: 2017-04-06 10:49:49,039 - Getting jmx metrics from NN failed. URL:…
0
votes
1 answer

how to install hadoop2.4.1 in windows with spark 2.0.0

i want to setup a cluster using hadoop in yarn mode..i want to use spark API for map-reduce and will use spark submit to deploy my applications..i want to work on cluster..can anyone help me how to install HADOOP in cluster using windows
0
votes
0 answers

Scale Up Hadoop disk on aws infrastructure

Our Hadoop cluster disk is getting full. So I want to scale up disk asap, and one way I can think is to increase ebs volume on all three nodes Hadoop 2.6.0-cdh5.5.1 (2 data-node, 1 name-node). So my doubt is, Is there any better way to scale up…
0
votes
2 answers

Try run Hive and have Error:“java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.0.0-alpha1”

I have a version of Hadoop on Ubuntu 16.10 Hadoop 3.0.0-alpha1 Source code repository https://git-wip-us.apache.org/repos/asf/hadoop.git -r a990d2ebcd6de5d7dc2d3684930759b0f0ea4dc3 Compiled by andrew on 2016-08-30T07:02Z Compiled with protoc…
Nikolay Baranenko
  • 132
  • 1
  • 4
  • 15
0
votes
1 answer

why cant I only access a port from localhost?

I am trying to set up elasticsearch on my mapR issued redhat virtual machine. It comes pre-loaded with the mapR ecosystem. I installed elasticsearch via yum. I am able to listen to it from inside the vm: [root@maprdemo elasticsearch]# curl -XGET…
0
votes
1 answer

Hadoop datanode - start with one disk and add more later or start with as much disks as possible and fill them equally

I'm wondering the following in regard to the Datanode disks setup in Hadoop cluster. Which of the those two options is better: To add one (or few) disks to the Datanode, and attach more after they start to fill in. Or to start with as many disks as…
mart
  • 3
  • 3
0
votes
1 answer

Why does DFSZKFailoverController kills Namenode process in hadoop?

I try to configure hadoop high availability cluster by following this tutorial: http://www.edureka.co/blog/how-to-set-up-hadoop-cluster-with-hdfs-high-availability/ When I follow that article I faces with two main problems: 1. hdfs namenode…
Oleksandr
  • 733
  • 2
  • 10
  • 17
0
votes
1 answer

What should HADOOP_PREFIX be for Accumulo installation?

I'm trying to install Accumulo 1.7.2 using these directions. ./bin/build_native_library.sh seems to succeed, and libaccumulo.so winds up in lib/native/libaccumulo.so in the Accumulo install directory. When I run ./bin/bootstrap_config.sh, I pick…
0
votes
1 answer

Server full ram but not show which process is use much RAM

One of our Hadoop datanode with CentOS7 64 bit is using all of it's RAM, I tried to figure out which process is using so much RAM but couldn't. Please help me to check this please: System: CentOS 7 64bit, with 64 GiB RAM HTOP on server: Htop show…
ZTE.A
  • 1
0
votes
1 answer

Force hadoop to use for S3 connection

I'm trying to upload file to S3 using hadoop: hadoop fs -Dfs.s3a.connection.ssl.enabled=false -Dfs.s3a.proxy.host=127.0.0.1 -Dfs.s3a.proxy.port=8123 -put pig_1421167148680.log s3a://access:secret@bucket/temp/trash But I can't force hadoop to use…
smaj
  • 13
  • 5
0
votes
2 answers

Root directory full , can't recover consumed space

I have researched my scenario everywhere but can't find any string related to my issue. I have a datanode in Hadoop Framework , which recently went bad because all the drives on that box got umounted due to some unknown reason. These drives are…
0
votes
1 answer

Hive Server2 not impersonating HDFS

I am trying to secure Hive using storage based security. I am using Kerberos and LDAP. What I am trying to get is Hive to create directories and files as user (and their main group) in HDFS. This way I hope to restrict access to databases based on…
user16611
  • 101
  • 3
0
votes
1 answer

Why does Accumulo require $ZOOKEEPER_HOME in addition to the IPs of the Zookeeper ensemble?

According to the documentation, Accumulo requires you to set $ZOOKEEPER_HOME (a local path) in the configuration files, and also requires a list of IPs for the Zookeeper ensemble. Why are IPs alone not sufficient? What if your Zookeeper ensemble is…
Ianvdl
  • 45
  • 7