Questions tagged [hadoop]

Hadoop is an open-source solution for providing a distributed/replicated file system, a produciton grade map-reduce system, and has a series of complementary additions like Hive, Pig, and HBase to get more out of a Hadoop-powered cluster.

Hadoop is an Apache foundation sponsored project, with commercial support provided by multiple vendors, including Cloudera, Hortonworks, and MapR. Apache has a more complete set of commercial solutions documented.

Available complementary additions to Hadoop include:

  • Hadoop distributed filesystem ( standard )
  • The map-reduce architecture ( standard )
  • Hive, which provides a SQL like interface to the M/R arch
  • Hbase, a distributed key-value service

Recommended reference sources:

261 questions
1
vote
1 answer

Redirecting Ambari-Server backup file creation to a different location

I am taking backup of my Ambari server using the command ambari-server backup This creates the backup file in the location /var/lib/ambari-server/ I want the backup to go to a different location, and I am not finding the way to do it. The help…
Gautam Somani
  • 296
  • 3
  • 14
1
vote
0 answers

dose ambari cluster needs ssh access between ambari-server machine to all other host

We installed ambari cluster with 3 masters machines While ambari server installed on master02 linux machine ambari cluster also include 25 DataNodes machines and 5 kafka's machines dose ambari-server needs ssh access to all other machines in the…
shalom
  • 461
  • 13
  • 29
1
vote
0 answers

CDH Community Edition Upgrade from 5.7 to 5.13 without Cloudera Express or Cloudera Enterprise

I am having a cluster like below - 3 Hbase Master (1 Active & 2 standby) 4 Region Servers 4 Data Nodes 1 Primary & 1 Secondary Name Node 3 Journal Node 4 Nodemanager 3 Resource Manager (1 Active & 2 standby) Query 1 What should be the order of…
tuk
  • 333
  • 5
  • 18
1
vote
1 answer

kafka + how to revert topic deletion

just in case we delete the wrong topic as /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper hdpmaster01:2181 --delete --topic gtom.poli.pri.proc Topic gtom.poli.pri.procis already marked for deletion and now we want to revert it what…
jango
  • 59
  • 2
  • 3
  • 12
1
vote
0 answers

How to prevent arbitrary executable execution on hadoop cluster

I am involved with configuring a Hadoop cluster for complete auditability and security. I am new to the Hadoop ecosystem, but I have a decent idea of the basics. I have a few concerns for which I hope someone might be able to point me in the right…
STN
  • 111
  • 1
1
vote
0 answers

YARN AM logs report different time-stamp from what is shown in terminal and sparkscala shell

I am trying to understand why the following occurred: I have a Docker container with Yarn and Spark running fine except that the timestamp of that container was minus X hours of what I wanted it to be. So when I was running date it was returning a…
rudimuse
  • 11
  • 1
1
vote
0 answers

Need to set 000 permission to specific hdfs data block through commandline

I am trying to set the “000” permission to the specific block. I used below command to find the block information: su - hdfs -c "hdfs fsck -locations -files -blocks /user/rohit/partition_filter_table/india.25.20.101.95000" Now, I want to set…
1
vote
1 answer

Hadoop FileAlreadyExistsException: Output directory hdfs://:9000/input already exists

I have Hadoop setup in fully distributed mode with one master and 3 slaves. I am trying to execute a jar file named Tasks.jar which is taking arg[0] as input directory and arg[1] as output directory. In my hadoop environment, I have the input files…
1
vote
0 answers

Why Hadoop TeraSort not using all cluster nodes

Question Regarding the TeraSort demo in hadoop, please suggest if the symptom is as expected or the workload should be distributed. Symptom Started Hadoop (3 nodes in a cluster) and run the TeraSort benchmark as below in Executions. I expected all…
mon
  • 235
  • 3
  • 10
1
vote
0 answers

Is it possible to configure hdfs in a federation mode and in an HA mode in the same time?

I don't understand if it possible to configure HDFS in both modes in the same time. Does it make sense? Can somebody show a simple configuration of HDFS in both modes? (nameNode1, nameNode2, nameNodeStandby1, nameNodeStandby2)
1
vote
1 answer

zfs for Hadoop cloud instead of ext4

Right now I have couple of linodes with ext4. I have a hadoop setup. What benefit would I get if I migrate my file system from ext4 to zfs. Will there be any benefit in response times? Any speed optimization while data gets exchanged in local lan…
M-BoB
  • 11
  • 2
1
vote
0 answers

Zombie process blocking port when restarting Hadoop (Secondary) Namenode

I'm having weird issues with the Hadoop Namenode and Secondary Namenode. Our HDFS cluster runs smoothly most of the time. But every now and then, either the Primary Namenode freezes (crashing the whole cluster) or the Secondary Namenode freezes and…
1
vote
1 answer

/usr/bin/env: python2.5: No such file or directory

I'm trying to play with Cloudera's Distribution for Hadoop on my EC2 account. For configuring it i'm using THIS tutorial. Everything seems to me fine but when I'm trying to run hadoop-ec2 I'm getting following…
Maksim
  • 115
  • 2
  • 6
1
vote
0 answers

Hadoop Hive, Impala, Pig, and more — SQL access to Hadoop?

It appears that Hive, Impala, Pig, and others all provide SQL or SQL-like access to data stored on Hadoop clusters. They all seem to have support for HDFS, S3, and other forms. So why are there so many different ways for accessing Hadoop…
vy32
  • 2,088
  • 2
  • 17
  • 21
1
vote
1 answer

hadoop fs commands not giving any output

I am running hadoop 1.2.1 on ubuntu 14.04 LTS in pseudo distributed mode, and the fs commands are not doing anything, i.e. niether prompt is returned nor any error message. what is the problem? Thanks in advance
ojas mohril
  • 111
  • 2